Introns and exons of the cystic fibrosis gene and mutations thereof

ABSTRACT

The identification, isolation and cloning of DNA sequences coding for mutant forms of the cystic fibrosis gene and their gene product are described. DNA sequence information and information relating to the genomic structure of the cystic fibrosis gene are provided. The mutant forms of the CF gene include specific sequence alterations in coding portions or of other genetic information at exon/intron boundaries and altered RNA transcripts and mutant protein products. Such DNA and protein information is useful in developing DNA or protein diagnosis for CF mutations, carrier and patient screening, as well as cloning of mutant genes and manufacturing of their proteins for investigation into therapies for cystic fibrosis.

FIELD OF THE INVENTION

The present invention relates generally to the cystic fibrosis (CF)gene, and, more particularly to the identification, isolation andcloning of the DNA sequence corresponding to mutants of the CF gene, aswell as their transcripts, gene products and genetic information atexon/intron boundaries. The present invention also relates to methods ofscreening for and detection of CF carriers, CF diagnosis, prenatal CFscreening and diagnosis, and gene therapy utilizing recombinanttechnologies and drug therapy using the information derived from theDNA, protein, and the metabolic function of the protein.

BACKGROUND OF THE INVENTION

Cystic fibrosis (CF) is the most common severe autosomal recessivegenetic disorder in the Caucasian population. It affects approximately 1in 2000 live births in North America [Boat et al, The Metabolic Basis ofInherited Disease, 6th ed, pp 2649-2680, McGraw Hill, NY (1989)].Approximately 1 in 20 persons are carriers of the disease.

Although the disease was first described in the late 1930's, the basicdefect remains unknown. The major symptoms of cystic fibrosis includechronic pulmonary disease, pancreatic exocrine insufficiency, andelevated sweat electrolyte levels. The symptoms are consistent withcystic fibrosis being an exocrine disorder. Although recent advanceshave been made in the analysis of ion transport across the apicalmembrane of the epithelium of CF patient cells, it is not clear that theabnormal regulation of chloride channels represents the primary defectin the disease. Given the lack of understanding of the molecularmechanism of the disease, an alternative approach has therefore beentaken in an attempt to understand the nature of the molecular defectthrough direct cloning of the responsible gene on the basis of itschromosomal location.

However, there is no clear phenotype that directs an approach to theexact nature of the genetic basis of the disease, or that allows for anidentification of the cystic fibrosis gene. The nature of the CF defectin relation to the population genetics data has not been readilyapparent. Both the prevalence of the disease and the clinicalheterogeneity have been explained by several different mechanisms: highmutation rate, heterozygote advantage, genetic drift, multiple loci, andreproductive compensation.

Many of the hypotheses can not be tested due to the lack of knowledge ofthe basic defect. Therefore, alternative approaches to the determinationand characterization of the CF gene have focused on an attempt toidentify the location of the gene by genetic analysis.

Linkage analysis of the CF gene to antigenic and protein markers wasattempted in the 1950's, but no positive results were obtained[Steinberg et al Am. J. Hum. Genet. 8: 162-176, (1956); Steinberg andMorton Am. J. Hum. Genet 8: 177-189, (1956); Goodchild et al J. Med.Genet. 7: 417-419, 1976.

More recently, it has become possible to use RFLP's to facilitatelinkage analysis. The first linkage of an RFLP marker to the CF gene wasdisclosed in 1985 [Tsui et al. Science 230: 1054-1057, 1985) in whichlinkage was found between the CF gene and an uncharacterized markerDOCRI-917. The association was found in an analysis of 39 families withaffected CF children. This showed that although the chromosomal locationhad not been established, the location of the disease gene had beennarrowed to about 1% of the human genome, or about 30 million nucleotidebase pairs.

The chromosomal location of the DOCRI-917 probe was established usingrodent-human hybrid cell lines containing different human chromosomecomplements. It was shown that DOCR1-917 (and therefore the CF gene)maps to human chromosome 7.

Further physical and genetic linkage studies were pursued in an attemptto pinpoint the location of the CF gene. Zengerling et al [Am. J. Hum.Genet. 40: 228-236 (1987)] describe the use of human-mouse somatic cellhybrids to obtain a more detailed physical relationship between the CFgene and the markers known to be linked with it. This publication showsthat the CF gene can be assigned to either the distal region of band q22or the proximal region of band q31 on chromosome 7.

Rommens et al [Am. J. Hum. Genet. 43: 645-663, (1988)] give a detaileddiscussion of the isolation of many new 7q31 probes. The approachoutlined led to the isolation of two new probes, D7S122 and D7S340,which are close to each other. Pulsed field gel electrophoresis mappingindicates that these two RFLP markers are between two markers known toflank the CF gene, MET [White, R., Woodward S., Leppert M., et al.Nature 318: 382-384, (1985)] and D7S8 [Wainwright, B. J., Scambler, P.J., and J. Schmidtke, Nature 318: 384-385 (1985)], therefore in the CFgene region. The discovery of these markers provides a starting pointfor chromosome walking and jumping.

Estivill et al, [Nature 326: 840-845 (1987)] disclose that a candidatecDNA gene was located and partially characterized. This however, doesnot teach the correct location of the CF gene. The reference discloses acandidate cDNA gone downstream of a CpG island, which areundermethylated GC nucleotide-rich regions upstream of many vertebrategenes. The chromosomal localization of the candidate locus is identifiedas the XV2C region. This region is described in European PatentApplication 88303645.1. However, that actual region does not include theCF gene.

A major difficulty in identifying the CF gene has been the lack ofcytologically detectable chromosome rearrangements or deletions, whichgreatly facilitated all previous successes in the cloning of humandisease genes by knowledge of map position.

Such rearrangements and deletions could be observed cytologically and asa result, a physical location on a particular chromosome could becorrelated with the particular disease. Further, this cytologicallocation could be correlated with a molecular location based on knownrelationship between publicly available DNA probes and cytologicallyvisible alterations in the chromosomes. Knowledge of the molecularlocation of the gene for a particular disease would allow cloning andsequencing of that gene by routine procedures, particularly when thegene product is known and cloning success can be confirmed byimmunoassay of expression products of the cloned genes.

In contrast, neither the cytological location nor the gene product ofthe gene for cystic fibrosis was known in the prior art. With the recentidentification of MET and D7S8, markers which flanked the CF gene butdid not pinpoint its molecular location, the present inventors devisedvarious novel gene cloning strategies to approach the CF gene inaccordance with the present invention. The methods employed in thesestrategies include chromosome jumping from the flanking markers, cloningof DNA fragments from a defined physical region with the use of pulsedfield gel electrophoresis, a combination of somatic cell hybrid andmolecular cloning techniques designed to isolate DNA fragments fromundermethylated CpG islands near CF, chromosome microdissection andcloning, and saturation cloning of a large number of DNA markers fromthe 7q31 region. By means of these novel strategies, the presentinventors were able to identify the gene responsible for cystic fibrosiswhere the prior art was uncertain or, even in one case, wrong.

The application of these genetic and molecular cloning strategies hasallowed the isolation and cDNA cloning of the cystic fibrosis gene onthe basis of its chromosomal location, without the benefit of genomicrearrangements to point the way. The identification of the normal andmutant forms of the CF gene and gene products has allowed for thedevelopment of screening and diagnostic tests for CF utilizing nucleicacid probes and antibodies to the gene product. Through interaction withthe defective gene product and the pathway in which this gene product isinvolved, therapy through normal gene product supplementation and genemanipulation and delivery are now made possible.

The gene involved in the cystic fibrosis disease process, hereinafterthe "CF gene" and its functional equivalents, has been identified,isolated and cDNA cloned, and its transcripts and gene productsidentified and sequenced. A three base pair deletion leading to theomission of a phenylalanine residue in the gene product has beendetermined to correspond to the mutations of the CF gene inapproximately 70% of the patients affected with CF, with differentmutations involved in most if not all the remaining cases. This subjectmatter is disclosed in co-pending U.S. patent application Ser. No.396,894 filed Aug. 22, 1989 and its related continuation-in-partapplications Ser. No. 399,945 filed Aug. 24, 1989 and Ser. No. 401,609filed Aug. 31, 1989.

SUMMARY OF THE INVENTION

According to this invention, other base pair deletions or alterationsleading to the omission of amino acid residues in the gene product havebeen determined. According to this invention other nucleotide deletionsor alterations leading to mutations in the DNA sequence resulting inframeshift or splice mutations have been determined.

With the identification and sequencing of the mutant gene and its geneproduct, nucleic acid probes and antibodies raised to the mutant geneproduct can be used in a variety of hybridization and immunologicalassays to screen for and detect the presence of either the defective CFgene or gene product. Assay kits for such screening and diagnosis canalso be provided. The genetic information derived from the intron/exonboundaries is also very useful in various screening and diagnosisprocedures.

Patient therapy through supplementation with the normal gene product,whose production can be amplified using genetic and recombinanttechniques, or its functional equivalent, is now also possible.Correction or modification of the defective gene product through drugtreatment means is now possible. In addition, cystic fibrosis can becured or controlled through gene therapy by correcting the gene defectin situ or using recombinant or other vehicles to deliver a DNA sequencecapable of expression of the normal gene product to the cells of thepatient.

According to another aspect of the invention, a purified mutant CF genecomprises a DNA sequence encoding an amino acid sequence for a proteinwhere the protein, when expressed in calls of the human body, isassociated with altered cell function which correlates with the geneticdisease cystic fibrosis.

According to another aspect of the invention, a purified RNA moleculecomprises an RNA sequence corresponding to the above DNA sequence.

According to another aspect of the invention, a DNA molecule comprises acDNA molecule corresponding to the above DNA sequence.

According to another aspect of the invention, a DNA molecule comprises aDNA sequence SEQ ID NO: 1 encoding mutant CFTR polypeptide as furthercharacterized by a nucleotide sequence variants resulting in deletion oralteration of amino acids or residue positions 85, 148, 178, 455, 493,507, 542, 549, 551, 560, 563, 574, 1077 and 1092.

According to another aspect of the invention, a DNA molecule comprisesan intronless DNA sequence encoding a mutant CFTR polypeptide having thesequence according to FIG. 1 for DNA sequence positions 1 to 4575 and,further characterized by nucleotide sequence variants resulting indeletion or alteration of DNA at DNA sequence positions 129, 556, 621+1,711+1, 1717-1 and 3659.

According to another aspect of the invention, a DNA molecule comprises acDNA molecule corresponding to the above DNA sequence.

According to another aspect of the invention, the cDNA moleculecomprises a DNA sequence selected from the group consisting of:

(a) DNA sequences which correspond to the mutant DNA sequence selectedfrom the group of mutant amino acid positions of 85, 148, 178, 455, 493,507, 542, 549, 551, 560, 563, 574, 1077 and 1092 and mutant DNA sequencepositions 129, 556, 621+1, 711+1, 1717-1 and 3659 and which encode, onexpression, for mutant CFTR polypeptide;

(b) DNA sequences which correspond to a fragment of the selected mutantDNA sequence, including at least twenty nucleotides;

(c) DNA sequences which comprise at least twenty nucleotides and encodea fragment of the selected mutant CFTR protein amino acid sequence;

(d) DNA sequences encoding an epitope encoded by at least eighteensequential nucleotides in the selected mutant DNA sequence.

According to another aspect of the invention, a DNA sequence selectedfrom the group consisting of:

(a) DNA sequences which correspond to portions of DNA sequences ofboundaries of exons/introns of the genomic CF gene;

(b) DNA sequences of at least eighteen sequential nucleotides atboundaries of exons/introns of the genomic CF gene depicted in FIG. 18;and

(c) DNA sequences of at least eighteen sequential nucleotides of intronportions of the genomic CF gene of FIG. 18.

According to another aspect of the invention, a purified nucleic acidprobe comprises a DNA or RNA nucleotide sequence corresponding to theabove noted selected DNA sequences of groups (a) to (c).

According to another aspect of the invention, purified RNA moleculecomprising RNA sequence corresponds to the mutant DNA sequence selectedfrom the group of mutant protein positions consisting of 85, 148, 178,455, 493, 507, 542, 549, 551, 560, 563, 574, 1077 and 1092 and of mutantDNA sequence positions consisting of 129, 556, 621+1, 711+1, 1717-1 and3659.

A purified nucleic acid probe comprising a DNA or RNA nucleotidesequence corresponding to the mutant sequences of the above recitedgroup.

According to another aspect of the invention, a recombinant cloningvector comprising the DNA sequences of the mutant DNA and fragmentsthereof selected from the group of mutant protein positions consistingof 85, 148, 178, 455, 493, 507, 542, 549, 551, 563, 574, 1077 and 1092and selected from the group of mutant DNA sequence positions consistingof 129, 556, 621+1, 711+1, 1717-1 and 3659 is provided. The vector,according to an aspect of this invention, is operatively linked to anexpression control sequence in the recombinant DNA molecule so that theselected mutant DNA sequences for the mutant CFTR polypeptide can beexpressed. The expression control sequence is selected from the groupconsisting of sequences that control the expression of genes ofprokaryotic or eukaryotic cells and their viruses and combinationsthereof.

According to another aspect of the invention, a method for producing amutant CFTR polypeptide comprises the steps of:

(a) culturing a host cell transfected with the recombinant vector forthe mutant DNA sequence in a medium and under conditions favorable forexpression of the mutant CFTR polypeptide selected from the group ofmutant CFTR polypeptides at mutant protein positions 85, 148, 178, 455,493, 507, 542, 549, 551, 560, 563, 574, 1077 and 1092 and mutant DNAsequence positions 129, 556, 621+1, 711+1 1717-1 and 3659; and

(b) isolating the expressed mutant CFTR polypeptide.

According to another aspect of the invention, a purified protein ofhuman cell membrane origin comprises an amino acid sequence encoded bythe mutant DNA sequences selected from the group of mutant proteinpositions of 85, 148, 178, 455, 493, 507, 542, 549, 551, 560, 563, 574,1077 and 1092 and from the group of mutant DNA sequence positions 129,556, 621+1, 711+1, 1717-1 and 3659 where the protein, when present inhuman cell membrane, is associated with cell function which causes thegenetic disease cystic fibrosis.

According to another aspect of the invention, a method is provided forscreening a subject to determine if the subject is a CF carrier or a CFpatient comprising the steps of providing a biological sample of thesubject to be screened and providing an assay for detecting in thebiological sample, the presence of at least a member from the groupconsisting of:

(a) mutant CF gene selected from the group of mutant protein positions85, 148, 178, 455, 493, 507, 542, 549, 551, 560, 563, 574, 1077 and 1092and from the group of mutant DNA sequence positions 129, 556, 621+1,711+1, 1717-1 and 3659;

(b) mutant CF gene products and mixtures thereof;

(c) DNA sequences which correspond to portions of DNA sequences ofboundaries of exons/introns of the genomic CF gene;

(d) DNA sequences of at least eighteen sequential nucleotides atboundaries of exons/introns of the genomic CF gene depicted in FIG. 18;and

(e) DNA sequences of at least eighteen sequential nucleotides of intronportions of the genomic CF gene of FIG. 18.

According to another aspect of the invention, a kit for assaying for thepresence of a CF gene by immunoassay techniques comprises:

(a) an antibody which specifically binds to a gene product of the mutantDNA sequence selected from the group of mutant protein positions 85,148, 178, 455, 493, 507, 542, 549, 551, 560, 563, 574, 1077 and 1092 andfrom the group of mutant DNA sequence positions 129, 556, 621+1, 711+1,1717-1 and 3659;

(b) reagent means for detecting the binding of the antibody to the geneproduct; and

(c) the antibody and reagent means each being present in amountseffective to perform the immunoassay.

According to another aspect of the invention, a kit for assaying for thepresence of a mutant CF gene by hybridization technique comprises:

(a) an oligonucleotide probe which specifically binds to the mutant CFgene having a mutation at a protein position selected from the groupconsisting of 85, 148, 178, 455, 493, 507, 542, 549, 551, 560, 563, 574,1077 and 1092 or having a mutation at a DNA sequence position selectedfrom the group consisting of 129, 556, 621+1, 711+1, 1717-1 and 3659;

(b) reagent means for detecting the hybridization of the oligonucleotideprobe to the mutant CF gene; and

(c) the probe and reagent means each being present in amounts effectiveto perform the hybridization assay.

According to another aspect of the invention, an animal comprises anheterologous cell system. The cell system includes a recombinant cloningvector which includes the recombinant DNA sequence corresponding to themutant DNA sequence which induces cystic fibrosis symptoms in theanimal.

According to another aspect of the invention, in a polymerase chainreaction to amplify a selected exon of a cDNA sequence of FIG. 1, theuse of oligonucleotide primers from intron portions near the 5' and 3'boundaries of the selected exon of FIG. 18.

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1 is the nucleotide sequence of the CF gene and the amino acidsequence of the CFTR protein amino acid sequence with .increment.indicating mutations at the 507 and 508 protein positions.

FIG. 2 is a restriction map of the CF gene and the schematic strategyused to chromosome walk and jump to the gene.

FIGS. 3A through 3E depict the physical map of the region including andsurrounding the CF gene generated by pulsed field gel electrophoresis.FIGS. 3A, 3B, 3C, and 3D show hybridization data for the restrictionenzymes Sal I, Xho I, Sfi I, and Nae I, respectively generated byrepresentative genomic and cDNA probes which span the region. Thededuced physical maps for each restriction enzyme is shown below eachpanel. FIG. 3E shows a composite map of the entire MET-D7S8 interval (J.M. Rommens et al., Am. J. Hum. Genet. 45:932-941, 1990). The open boxedsegment indicates the portion cloned by chromosome walking and jumping,and the filled arrow indicates the portion covered by the CF transcript.

FIGS. 4A, 4B and 4C show the detection of conserved nucleotide sequencesby cross-species hybridization.

FIG. 4D is a restriction map of overlapping segments of probes E4.3 andH1.6.

FIG. 5 is an RNA blot hydridization analysis using genomic and cDNAprobes. Hybridization to RNA of: A-fibroblast with cDNA prob G-2;B-trachea (from unafflicted and CF patient individuals), pancreas,liver, HL60 cell line and brain with genomic probe CF16; C-T84 cell linewith cDNA probe 10-1.

FIG. 6 is the methylation status of the E4.3 cloned region at the 5' endof the CF gene.

FIG. 7 is a restriction map of the CFTR cDNA showing alignment of thecDNA to the genomic DNA fragments.

FIG. 8 is an RNA gel blot analysis depicting hybridization by a portionof the CFTR cDNA (clone 10-1) to a 6.5 kb mRNA transcript in varioushuman tissues.

FIGS. 9A, 9B, 9C and 9D are DNA blot hybridization analyses depictinghybridization by the CFTR cDNA clones to genomic DNA digested with EcoRIand Hind III.

FIGS. 10A, 10B and 10C are primer extension experiments whichcharacterize the 5' and 3' ends of the CFTR cDNA.

FIG. 11 is a hydropathy profile and shows predicted secondary structuresof CFTR.

FIG. 12 is a dot matrix analysis of internal homologies in the predictedCFTR polypeptide.

FIG. 13 is a schematic model of the predicted CFTR protein.

FIG. 14 is a schematic diagram of the restriction fragment lengthpolymorphisms (RFLP's) closely linked to the CF gene where the invertedtriangle indicates the locatin of the F508 3 base pair deletion.

FIG. 15 represents alignment of the most conserved segments of theextended NBFs of CFTR with comparable regions of other proteins.

FIGS. 16A and 16B show the DNA sequence around the F508 deletion.

FIGS. 17A and 17B are representations of the nucleotide sequencing gelsshowing the DNA sequence at the F508 deletion.

FIG. 18 is the nucleotide sequence of the portions of introns andcomplete exons of the genomic CF gene for 27 exons identified andnumbered sequentially as 1 thorugh 24 with additional exons 6a, 6b, 14a,14b and 17a, 17b of cDNA sequence of FIG. 1;

FIGS. 19A, 19B and 19C show the results of amplification of genomic DNAusing intron oligonucleotides bounding exon 10 to note differences inthe nucleotide sequence;

FIG. 20 shows the separation by gel electrophoresis of the amplifiedgenomic DNA products of a CF family; and

FIG. 21 is a restriction mapping of cloned intron and exon portions ofgenomic DNA which introns and exons are identified in FIG. 18.

DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS

1. Definitions

In order to facilitate review of the various embodiments of theinvention and an understanding of various elements and constituents usedin making the invention and using same, the following definition ofterms used in the invention description is as follows:

CF--cystic fibrosis

CF carrier--a person in apparent health whose chromosomes contain amutant CF gene that may be transmitted to that person's offspring.

CF patient--a person who carrier a mutant CF gene on each chromosome,such that they exhibit the clinical symptoms of cystic fibrosis.

CF gene--the gene whose mutant forms are associated with the diseasecystic fibrosis. This definition is understood to include the varioussequence polymorphisms that exist, wherein nucleotide substitutions inthe gene sequence do not affect the essential function of the geneproduct. This term primarily relates to an isolated coding sequence, butcan also include some or all of the flanking regulatory elements and/orintrons.

Genomic CF gene--the CF gene which includes flanking regulatory elementsand/or introns at boundaries of exons of the CF gene.

CF--PI--cystic fibrosis pancreatic insufficient, the major clinicalsubgroup of cystic fibrosis patients, characterized by insufficientpancreatic exocrine function.

CF--PS--cystic fibrosis pancreatic sufficient, a clinical subgroup ofcystic fibrosis patients with sufficient pancreatic exocrine functionfor normal digestion of food.

CFTR--cystic fibrosis transmembrane conductance regulator protein,encoded by the CF gene. This definition includes the protein as isolatedfrom human or animal sources, as produced by recombinant organisms, andas chemically or enzymatically synthesized. This definition isunderstood to include the various polymorphic forms of the proteinwherein amino acid substitutions in the variable regions of the sequencedoes not affect the essential functioning of the protein, or itshydropathic profile or secondary or tertiary structure.

DNA--standard nomenclature is used to identify the bases.

Intronless DNA--a piece of DNA lacking internal non-coding segments, forexample, cDNA.

IRP locus sequence--(protooncogene int-1 related), a gene located nearthe CF gene.

Mutant CFTR--a protein that is highly analogous to CFTR in terms ofprimary, secondary, and tertiary structure, but wherein a small numberof amino acid substitutions and/or deletions and/or insertions result inimpairment of its essential function, so that organisms whose epithelialcells express mutant CFTR rather than CFTR demonstrate the symptoms ofcystic fibrosis.

mCF--a mouse gene orthologous to the human CF gene

NBFs--nucleotide (ATP) binding folds

ORF--open reading frame

PCR--polymerase chain reaction

Protein--standard single letter nomenclature is used to identify theamino acids

R-domain--a highly charged cytoplasmic domain of the CFTR protein

RSV--Rous Sarcoma Virus

SAP--surfactant protein

RFLP--restriction fragment length polymorphism

507 mutant CF gene--the CF gene which includes a DNA base pair mutationat the 506 or 507 protein position of the cDNA of the CF gene

507 mutant DNA sequence--equivalent meaning to the 507 mutant CF gene

507 mutant CFTR protein or mutant CFTR protein amino acid sequence, ormutant CFTR polypeptide--the mutant CFTR protein wherein an amino aciddeletion occurs at the isoleucine 506 or 507 protein position of theCFTR.

Protein position means amino acid residue position.

2. Isolating the CF Gene

Using chromosome walking, jumping, and cDNA hybridization, DNA sequencesencompassing >500 kilobase pairs (kb) have been isolated from a regionon the long arm of human chromosome 7 containing the cystic fibrosis(CF) gene. This technique is disclosed in detail in the aforementionco-pending U.S. patent applications. For purposes of convenience inunderstanding and isolating the CF gene and identifying other mutations,such as at the 85, 148, 1178, 455, 493, 507, 542, 549, 560, 563, 574,1077 and 1092 amino acid residue positions, the technique is reiteratedhere. Several transcribed sequences and conserved segments have beenidentified in this region. One of these corresponds to the CF gene andspans approximately 250 kb of genomic DNA. Overlapping complementary DNA(cDNA) clones have been isolated from epithelial cell libraries with agenomic DNA segment containing a portion of the cystic fibrosis gene.The nucleotide sequence of the isolated cDNA is shown in FIGS. 1 through18. In each row of the respective sequences the lower row is a list bystandard nomenclature of the nucleotide sequence. The upper row in eachrespective row of sequences is standard single letter nomenclature forthe amino acid corresponding to the respective codon.

Accordingly, the isolation of the CF gene provided a cDNA moleculecomprising a DNA sequence selected from the group consisting of:

(a) DNA sequences SEQ ID NO: 1;

(b) DNA sequences having SEQ ID NO: 1 and encoding normal CFTRpolypeptide (SEQ ID NO: 2);

(c) DNA sequences which correspond to a fragment of SEQ ID NO: 1including at least 16 sequential nucleotides of SEQ ID NO: 1;

(d) DNA sequences which comprise at least 16 nucleotides and encode afragment of the amino acid sequence of FIG. 1 (SEQ ID NO: 2); and

(e) DNA sequences encoding an epitope encoded by at least 18 sequentialnucleotides of SEQ ID NO: 1.

According to this invention, the isolation of other mutations in the CFgene also provides a cDNA molecule comprising a DNA sequence selectedfrom the group consisting of:

a) DNA sequences which correspond to the DNA sequence encoding mutantCFTR polypeptide characterized by cystic fibrosis-associated activity inhuman epithelial cells, or the DNA sequence of SEQ ID NO: 1, yet furthercharacterized by a base pair mutation which results in the deletion ofor a change for an amino acid at residue positions 85, 148, 178, 455,493, 507, 542, 549, 551, 560, 563, 574, 1077 and 1092 of SEQ ID NO: 2.

b) DNA sequences which correspond to fragments of the mutant portion ofthe sequence of paragraph a) and which include at least sixteennucleotides;

c) DNA sequences which comprise at least sixteen nucleotides and encodea fragment of the amino acid sequence encoded for by the mutant portionof the DNA sequence of paragraph a); and

d) DNA sequences encoding an epitope encoded by at least 18 sequentialnucleotides in the mutant portion of the sequence of the DNA ofparagraph a).

Transcripts of approximately 6,500 nucleotides in size are detectable intissues affected in patients with CF. Based upon the isolated nucleotidesequence, the predicted protein consists of two similar regions, eachcontaining a first domain having properties consistent with membraneassociation and a second domain believed to be involved in ATP binding.

A 3 bp deletion which results in the omission of a phenylalanine residueat the center of the first predicted nucleotide binding domain (aminoacid position 508 of the CF gene product) was detected in CF patients.This mutation in the normal DNA sequence of FIG. 1 corresponds toapproximately 70% of the mutations in cystic fibrosis patients. Extendedhaplotype data based on DNA markers closely linked to the putativedisease gene suggest that the remainder of the CF mutant gene poolconsists of multiple, different mutations. This is now exemplified bythis invention at, for example, the 506 or 507 protein position. A smallset of these latter mutant alleles (approximately 8%) may conferresidual pancreatic exocrine function in a subgroup of patients who arepancreatic sufficient.

2.1 Chromosome Walking and Jumping

Large amounts of the DNA surrounding the D7S122 and D75340 linkageregions of Rommens et al supra were searched for candidate genesequences. In addition to conventional chromosome walking methods,chromosome jumping techniques were employed to accelerate the searchprocess. From each jump endpoint a new bidirectional walk could beinitiated. Sequential walks halted by "unclonable" regions oftenencountered in the mammalian genome could be circumvented by chromosomejumping.

The chromosome jumping library used has been described previously[Collins et al, Science 235, 1046 (1987); Ianuzzi et al, Am. J. Hum.Genet. 44, 695 (1989)]. The original library was prepared from apreparative pulsed field gel, and was intended to contain partial EcoR1fragments of 70-130 kb; subsequent experience with this libraryindicates that smaller fragments were also represented, and jumpsizes of25-110 kb have been found. The library was plated on sup⁻ host MC1061and screened by standard techniques, [Maniatis et al]. Positive cloneswere subcloned into pBRΔ23Ava and the beginning and end of the jumpidentified by EcoR1 and Ava 1 digestion, as described in Collins, Genomeanalysis: A practical approach (IRL, London, 1988), pp. 73-94) . Foreach clone, a fragment from the end of the jump was checked to confirmits location on chromosome 7. The contiguous chromosome region coveredby chromosome walking and jumping was about 250 kb. Direction of thejumps was biased by careful choice of probes, as described by Collins etal and Ianuzzi et al, supra. The entire region cloned, including thesequences isolated with the use of the CF gene cDNA, is approximately500 kb.

The schematic representation of the chromosome walking and jumpingstrategy is illustrated in FIG. 2. CF gene exons are indicated by Romannumerals in this Figure. Horizontal lines above the map indicate walksteps whereas the arcs above the map indicate jump steps. The Figureproceeds from left to right in each of six tiers with the direction ofends toward 7 cen and 7 qter as indicated. The restriction map for theenzymes EcoRI, HindIII, and BamHI is shown above the solid line,spanning the entire cloned region. Restriction sites indicated witharrows rather than vertical lines indicate sites which have not beenunequivocally positioned. Additional restriction sites for other enzymesare shown below the line. Gaps in the cloned region are indicated by ||.These occur only in the portion detected by cDNA clones of the CFtranscript. These gaps are unlikely to be large based on pulsed fieldmapping of the region. The walking clones, as indicated by horizontalarrows above the map, have the direction of the arrow indicating thewalking progress obtained with each clone. Cosmid clones begin with theletter c; all other clones are phage. Cosmid CF26 proved to be achimera; the dashed portion is derived from a different genomic fragmenton another chromosome. Roman numerals I through XXIV indicate thelocation of exons of the CF gene. The horizontal boxes shown above theline are probes used during the experiments. Three of the probesrepresent independent subcloning of fragments previously identified todetect polymorphisms in this region: H2.3A corresponds to probe XV2C (X.Estivill et al, Nature, 326: 840 (1987), probe E1 corresponds to KM19(Estivill, supra), and probe E4.1 corresponds to Mp6d.9 (X. Estivill etal. Am. J. Hum. Genet. 44, 704 (1989)). G-2 is a subfragment of E6 whichdetects a transcribed sequence. R161, R159, and R160 are syntheticoligonucleotides constructed from parts of the IRP locus sequence [B. J.Wainwright et al, EMBO J., 7: 1743 (1988)], indicating the location ofthis transcript on the genomic map.

As the two independently isolated DNA markers, D7S122 (pH131) and D7S340(TM58), were only approximately 10 kb apart (FIG. 2), the walks andjumps were essentially initiated from a single point. The direction ofwalking and jumping with respect to MET and D7S8 was then establishedwith the crossing of several rare-cutting restriction endonucleaserecognition sites (such as those for Xho I, Nru I and Not I, see FIG. 2)and with reference to the long range physical map of J. M. Rommens etal. Am. J. Hum. Genet., in press; A. M. Poustka, et al, Genomics 2, 337(1988); M. L. Drumm et al. Genomics 2, 346 (1988). The pulsed fieldmapping data also revealed that the Not I site identified by theinventors of the present invention (see FIG. 2, position 113 kb)corresponded to the one previously found associated with the IRP locus(Estivill et al 1987, supra). Since subsequent genetic studies showedthat CF was most likely located between IRP and D7S8 [M. Farrall et al,Am. J. Hum. Genet. 43, 471 (1988), B. S. Kerem et al. Am. J. Hum. Genet.44, 827 (1989)], the walking and jumping effort was continuedexclusively towards cloning of this interval. It is appreciated, howeverthat other coding regions, as identified in FIG. 2, for example, G-2,CF14 and CF16, were located and extensively investigated. Such extensiveinvestigations of these other regions revealed that they were not the CFgene based on genetic data and sequence analysis. Given the lack ofknowledge of the location of the CF gene and its characteristics, theextensive and time consuming examination of the nearby presumptivecoding regions did not advance the direction of search for the CF gene.However, these investigations were necessary in order to rule out thepossibility of the CF gene being in those regions.

Three regions in the 280 kb segment were found not to be readilyrecoverable in the amplified genomic libraries initially used. Theseless clonable regions were located near the DNA segments H2.3A and X.6,and just beyond cosmid cW44, at positions 75-100 kb, 205-225 kb, and275-285 kb in FIG. 2, respectively. The recombinant clones near H2.3Awere found to be very unstable with dramatic rearrangements after only afew passages of bacterial culture. To fill in the resulting gaps,primary walking libraries were constructed using special host-vectorsystems which have been reported to allow propagation of unstablesequences (A. R. Wyman, L. B. Wolfe, D. Botstein, Proc. Nat. Acad. Sci.U.S.A. 82, 2880 (1985); K. F. Wertman, A. R. Wyman, D. Botstein, Gene49, 253 (1986); A. R. Wyman, K. F. Wertman, D. Barker, C. Helms, W. H.Petri, Gene, 49, 263 (1986)]. Although the region near cosmid cW44remains to be recovered, the region near X.6 was successfully rescuedwith these libraries.

2.2 Construction of Genomic Libraries

Genomic libraries were constructed after procedures described inManatis, et al, Molecular Cloning: A Laboratory Manual (Cold SpringHarbor Laboratory, Cold Spring Harbor, N.Y. 1982) and are listed inTable 1. This includes eight phage libraries, one of which was providedby T. Maniatis [Fritsch et al, Cell, 19:959 (1980)]; the rest wereconstructed as part of this work according to procedures described inManiatis et al, supra. Four phage libraries were cloned in λDASH(commercially available from Stratagene) and three in λFIX (commerciallyavailable from Stratagene), with vector arms provided by themanufacturer. One λDASH library was constructed from Sau 3A-partiallydigested DNA from a human-hamster hybrid containing human chromosome 7(4AF/102/K015) [Rommens et al Am. J. Hum. Genet 43, 4 (1988)], and otherlibraries from partial Sau3A, total BamHI, or total EcoRI digestion ofhuman peripheral blood or lymphoblastoid DNA. To avoid loss of unstablesequences, five of the phage libraries were propagated on therecombination-deficient hosts DB1316 (recD⁻), CES 200 (recBC⁻) [Wyman etal, supra, Wertman et al supra, Wyman et al supra]; or TAP90 [Pattersonet al Nucleic Acids Res. 15:6298 (1987)]. Three cosmid libraries werethen constructed. In one the vector pCV108 [Lau et al Proc. Natl. Acad.Sci USA 80:5225 (1983)] was used to clone partially digested (Sau 3A)DNA from 4AF/102/K015 [Rommens et al Am. J. Hum. Genet. 43:4 (1988)]. Asecond cosmid library was prepared by cloning partially digested (Mbo I)human lymphoblastoid DNA into the vector pWE-IL2R, prepared by insertingthe RSV (Rous Sarcoma Virus) promoter-driven cDNA for the interleukin-2receptor α-chain (supplied by M. Fordis and B. Howard) in place of theneo-resistance gene of pWE15 [Wahl et al Proc. Natl. Acad. Sci. USA84:2160 (1987)]. An additional partial Mbo I cosmid library was preparedin the vector pWE-IL2-Sal, created by inserting a Sal I linker into theBam HI cloning site of pWE-EL2R (M. Drumm, unpublished data); thisallows the use of the partial fill-in technique to ligate Sal I and MboI ends, preventing tandem insertions [Zabarovsky et al Gene 42:19(1986)]. Cosmid libraries were propagated in E. coli host strains DH1 or490A [M. Steinmetz, A. Winoto, K. Minard, L. Hood, Cell 28, 489(1982)].TABLE 1

    __________________________________________________________________________    GENOMTC LIBRARIES                                                             Vector                                                                              Source of human DNA                                                                        Host Complexity                                                                           Ref                                            __________________________________________________________________________     Charon                                                                             HaeII/AluI-partially                                                                       LE392                                                                              1 × 10.sup.6                                                                   Lawn                                             4A      digested total human    (amplified)   et al                                   liver DNA                                  1980                       pCV108    Sau3a-partially digested DK1    3 × 10.sup.6                          DNA from 4AF/KO15       (amplified)                                    dash     Sau3A-partially digested LE392  1 × 10.sup.6                       DNA from 4AF/KO15       (amplified)                                       dash      Sau3A-partiaily digested DB1316 1.5 × 10.sup.6                                                  total human peripheral                           blood DNA                                                               dash      BamHI-digested  total   DB1316  1.5 × 10.sup.6                     human peripheral blood                                                         DNA                                                                     Adash      EcoRI-partially digested DB1316 8 × 10.sup.6                      total human peripheral                                                         blood DNA                                                               FIX       MboI-partialiy digested LE392   1.5 × 10.sup.6                     human lymphablastoid DNA                                                 FIX        MboI-partially digested CE200   1.2 × 10.sup.6                                                  human lyuphoblastoid DNA                   FIX       MboI-partially digested TAP90   1.3 × 10.sup.6                      human lymphoblastoid DNA                                                pWE-IL2R   MboI-partially digested 490A    5 × 10.sup. 3                      human lymphoblastoid DNA                                                PWE-1L2R-  MboI-partially digested 490A    1.2 × 10.sup.6                                             Sal        human lymphoblastoid DNA                                           1Ch 3A   EcoRZ-partialiy digested  MC1061                                    3 × 10.sup.6 Collins et al.,                                             Δ lac (24-110 kb)   supra and                                           (Jumping) human lymphoblastoid DNA                                           Ianuzzi et al.,                                      supra.                                                                  __________________________________________________________________________

Three of the phage libraries were propagated and amplified in E. colibacterial strain LE392. Four subsequent libraries were plated on therecombination-deficient hosts DB1316 (recD⁻) or CES200 (rec BC⁻) [Wyman1985, supra; Wertman 1986, supra; and Wyman 1986, supra] or in one caseTAP90 [T. A. Patterson and M. Dean, Nucleic Acids Research 15, 6298(1987)].

Single copy DNA segments (free of repetitive elements) near the ends ofeach phage or cosmid insert were purified and used as probes for libraryscreening to isolate overlapping DNA fragments by standard procedures.(Maniatis, et al, supra).

1-2×10⁶ phage clones were plated on 25-30 150 mm petri dishes with theappropriate indicator bacterial host and incubated at 37° C. for 10-16hr. Duplicate "lifts" were prepared for each plate with nitrocelluloseor nylon membranes, prehybridized and hybridized under conditionsdescribed [Rommens et al, 1988, supra]. Probes were labelled with ³² Pto a specific activity of >5×10⁸ cpm/μg using the random primingprocedure [A. P. Feinberg and B. Vogelstein, Anal. Biochem. 132, 6(1983)]. The cosmid library was spread on ampicillin-containing platesand screened in a similar manner.

DNA probes which gave high background signals could often be used moresuccessfully by preannealing the boiled probe with 250 μg/ml sheareddenatured placental DNA for 60 minutes prior to adding the probe to thehybridization bag.

For each walk step, the identity of the cloned DNA fragment wasdetermined by hybridization with a somatic cell hybrid panel to confirmits chromosomal location, and by restriction mapping and Southern blotanalysis to confirm its colinearity with the genome.

The total combined cloned region of the genomic DNA sequences isolatedand the overlapping cDNA clones, extended >500 kb. To ensure that theDNA segments isolated by the chromosome walking and jumping procedureswere colinear with the genomic sequence, each segment was examined by:

(a) hybridization analysis with human-rodent somatic hybrid cell linesto confirm chromosome 7 localization,

(b) pulsed field gel electrophoresis, and

(c) comparison of the restriction map of the cloned DNA to that of thegenomic DNA.

Accordingly, single copy human DNA sequences were isolated from eachrecombinant phage and cosmid clone and used as probes in each of thesehybridization analyses as performed by the procedure of Maniatis, et alsupra.

While the majority of phage and cosmid isolates represented correct walkand jump clones, a few resulted from cloning artifacts orcross-hybridizing sequences from other regions in the human genome, orfrom the hamster genome in cases where the libraries were derived from ahuman-hamster hybrid cell line. Confirmation of correct localization wasparticularly important for clones isolated by chromosome jumping. Manyjump clones were considered and resulted in non-conclusive informationleading the direction of investigation away from the gene.

2.3 Confirmation of the Restriction Map

Further confirmation of the overall physical map of the overlappingclones was obtained by long range restriction mapping analysis with theuse of pulsed field gel electrophoresis (J. M. Rommens, et al. Am. J.Hum. Genet. in press, A. M. Poustka et al, 1988, supra M. L. Drumm etal, 1988 supra).

FIGS. 3A to 3E illustrates the findings of the long range restrictionmapping study, where a schematic representation of the region is givenin Panel E. DNA from the human-hamster cell line 4AF/102/K015 wasdigested with the enzymes (A) Sa1 I, (B) Xho I, (C) Sfi I and (D) Nae I,separated by pulsed field gel electrophoresis, and transferred toZetaprobe™ (BioRad). For each enzyme a single blot was sequentiallyhybridized with the probes indicated below each of the panels of FIGS. Ato D, with stripping of the blot between hybridizations. The symbols foreach enzyme of FIG. 3E are: A, Nae I; B, Bss HII; F. Sfi I; L, Sa1 I; M,M1u I; N, Not I; R, Nru I; and X, Xho l. C corresponds to thecompression zone region of the gel. DNA preparations, restrictiondigestion, and crossed field gel electrophoresis methods have beendescribed (Rommens et al, in press, supra). The gels in FIG. 3 were runin 0.5× TBE at 7 volts/cm for 20 hours with switching linearly rampedfrom 10-40 seconds for (A), (B), and (C), and at 8 volts/cm for 20 hourswith switching ramped linearly from 50-150 seconds for (D). Schematicinterpretations of the hybridization pattern are given below each panel.Fragment lengths are in kilobases and were sized by comparison tooligomerized bacteriophage λDNA and Saccharomyces cerevisiaechromosomes.

H4.0, J44, EG1.4 are genomic probes generated from the walking andjumping experiments (see FIG. 2). J30 has been isolated by fourconsecutive jumps from D7S8 (Collins et al, 1987, supra; Ianuzzi et al,1989, supra; M. Dean, et al, submitted for publication). 10-1, B.75, andCE1.5/1.0 are cDNA probes which cover different regions of the CFtranscript: 10-1 contains exons I-VI, B.75 contains exons V-XII, andCE1.5/1.0 contains exons XII-XXIV. Shown in FIG. 3E is a composite mapof the entire MET-D7S8 interval. The open boxed region indicates thesegment cloned by walking and jumping, and the closed arrow portionindicates the region covered by the CF transcript. The CpG-rich regionassociated with the D7S23 locus (Estivill et al, 1987, supra) is at theNot I site shown in parentheses. This and other sites shown inparentheses or square brackets do not cut in 4AF/102/K015, but have beenobserved in human lymphoblast cell lines.

2.4 Identification of CF Gene

Based on the findings of long range restriction mapping detailed aboveit was determined that the entire CF gene is contained on a 380 kb Sal Ifragment. Alignment of the restriction sites derived from pulsed fieldgel analysis to those identified in the partially overlapping genomicDNA clones revealed that the size of the CF gene was approximately 250kb.

The most informative restriction enzyme that served to align the map ofthe cloned DNA fragments and the long range restriction map was Xho I;all of the 9 Xho l sites identified with the recombinant DNA clonesappeared to be susceptible to at least partial cleavage in genomic DNA(compare maps in FIGS. 1 and 2). Furthermore, hybridization analysiswith probes derived from the 3' end of the CF gene identified 2 SfiIsites and confirmed the position of an anticipated Nae I site.

These findings further supported the conclusion that the DNA segmentsisolated by the chromosome walking and jumping procedures were colinearwith the genuine sequence.

2.5 Criteria for Identification

A positive result based on one or more of the following criteriasuggested that a cloned DNA segment may contain candidate genesequences:

(a) detection of cross-hybridizing sequences in other species (as manygenes show evolutionary conservation),

(b) identification of CpG islands, which often mark the 5' end ofvertebrate genes [A. P. Bird, Nature, 321, 209 (1986); M.Gardiner-Garden and M. Frommer, J. Mol. Biol. 196, 261 (1987)],

(c) examination of possible mRNA transcripts in tissues affected in CFpatients,

(d) isolation of corresponding cDNA sequences,

(e) identification of open reading frames by direct sequencing of clonedDNA segments.

Cross-species hybridization showed strong sequence conservation betweenhuman and bovine DNA when CF14, E4.3 and H1.6 were used as probes, theresults of which are shown in FIGS. 4A, 4B and 4C.

Human, bovine, mouse, hamster, and chicken genomic DNAs were digestedwith Eco RI (R), Hind III (H), and Pst I (P), electrophoresed, andblotted to Zetabind™ (BioRad). The hybridization procedures of Rommenset al, 1988, supra, were used with the most stringent wash at 55° C.,0.2× SSC, and 0.1% SDS. The probes used for hybridization, in FIG. 4,included: (A) entire cosmid CF14, (B) E4.3, (C) H1.6. In the schematicof FIG. (D), the shaded region indicates the area of cross-speciesconservation.

The fact that different subsets of bands were detected in bovine DNAwith these two overlapping DNA segments (H1.6 and E4.3) suggested thatthe conserved sequences were located at the boundaries of the overlappedregion (FIG. 4(D)). When these DNA segments were used to detect RNAtranscripts from a variety of tissues, no hybridization signal wasdetected. In an attempt to understand the cross-hybridizing region andto identify possible open reading frames, the DNA sequences of theentire H1.6 and part of the E4.3 fragment were determined. The resultsshowed that, except for a long stretch of CG-rich sequence containingthe recognition sites for two restriction enzymes (Bss HII and Sac II),often found associated with undermethylated CpG islands, there ore onlyshort open reading frames which could not easily explain the strongcross-species hybridization signals.

To examine the methylation status of this highly CpG-rich regionrevealed by sequencing, genomic DNA samples prepared from fibroblastsand lymphoblasts were digested with the restriction enzymes Hpa II andMsp I and analyzed by gel blot hybridization. The enzyme Hpa II cuts theDNA sequence 5'-CCGG-3' only when the second cytosine is unmethylated,whereas Msp I cuts this sequence regardless of the state of methylation.Small DNA fragments were generated by both enzymes, indicating that thisCpG-rich region is indeed undermethylated in genomic DNA. The gel-blothybridization with the E4.3 segment (FIG. 6) reveals very smallhybridizing fragments with both enzymes, indicating the presence of ahypomethylated CpG island.

The above results strongly suggest the presence of a coding region atthis locus. Two DNA segments (E4.3 and H1.6) which detectedcross-species hybridization signals from this area were used as probesto screen cDNA libraries made from several tissues and cell types.

cDNA libraries from cultured epithelial cells were prepared as follows.Sweat gland cells derived from a non-CF individual and from a CF patientwere grown to first passage as described [G. Collie et al, In VitroCell. Dev. Biol. 21, 592,1985]. The presence of outwardly rectifyingchannels was confirmed in these cells (J. A. Tabcharani, T. J. Jensen,J. R. Riordan, J. W. Hanrahan, J. Memb. Biol., in press) but the CFcells were insensitive to activation by cyclic AMP (T. J. Jensen, J. W.Hanrahan, J. A. Tabcharani, M. Buchwald and J. R. Riordan, PediatricPulmonology, Supplement 2, 100, 1988). RNA was isolated from them by themethod of J. M. Chirgwin et al (Biochemistry 18, 5294, 1979). Poly A+RNAwas selected (H. Aviv and P. Leder, Proc. Natl. Acad. Sci. USA 69, 1408,1972) and used as template for the synthesis of cDNA with oligo (dT)12-18 as a primer. The second strand was synthesized according to Gublerand Hoffman (Gene 25, 263, 1983). This was methylated with Eco RImethylase and ends were made flush with T4 DNA polymerase.Phosphorylated Eco RI linkers were ligated to the cDNA and restrictedwith Eco RI. Removal of excess linkers and partial size fractionationwas achieved by Biogel A-50 chromatography. The cDNAs were then ligatedinto the Eco RI site of the commercially available lamdba ZAP.Recombinant were packaged and propagated in E. coli BB4. Portions of thepackaging mixes were amplified and the remainder retained for screeningprior to amplification. The same procedures were used to construct alibrary from RNA isolated from preconfluent cultures of the T-84 coloniccarcinoma cell line (Dharmsathaphorn, K. et al. Am. J. Physiol. 246,G204, 1984). The numbers of independent recombinant in the threelibraries were: 2×10⁶ for the non-CF sweat gland cells, 4.5×10⁶ for theCF sweat gland cells and 3.2×10⁶ from T-84 cells. These phages wereplated at 50,000 per 15 cm plate and plaque lifts made using nylonmembranes (Biodyne) and probed with DNA fragments labelled with ³² Pusing DNA polymerase I and a random mixture of oligonucleotides asprimer. Hybridization conditions were according to G. M. Wahl and S. L.Berger (Meth. Enzymol. 152,415, 1987). Bluescript™ plasmids were rescuedfrom plaque purified clones by excision with M13 helper phage. The lungand pancreas libraries were purchased from Clontech Lab Inc. withreported sizes of 1.4×10⁶ and 1.7×10⁶ independent clones.

After screening 7 different libraries each containing 1×10⁵ -5×10⁶independent clones, 1 single clone (identified as 10-1) was isolatedwith H1.6 from a cDNA library made from the cultured sweat glandepithelial cells of an unaffected (non-CF) individual.

DNA sequencing analysis showed that probe 10-1 contained an insert of920 bp in size and one potential, long open reading frame (ORF). Sinceone end of the sequence shared perfect sequence identity with H1.6, itwas concluded that the cDNA clone was probably derived from this region.The DNA sequence in common was, however, only 113 bp long (see FIGS. 1and 7). As detailed below, this sequence in fact corresponded to the5'-most exon of the putative CF gene. The short sequence overlap thusexplained the weak hybridization signals in library screening andinability to detect transcripts in RNA gel-blot analysis. In addition,the orientation of the transcription unit was tentatively established onthe basis of alignment of the genomic DNA sequence with the presumptiveORF of 10-1.

Since the corresponding transcript was estimated to be approximately6500 nucleotides in length by RNA gel-blot hybridization experiments,further cDNA library screening was required in order to clone theremainder of the coding region. As a result of several successivescreenings with cDNA libraries generated from the colonic carcinoma cellline T84, normal and CF sweat gland cells, pancreas and adult lungs, 18additional clones were isolated (FIG. 7, as subsequently discussed ingreater detail). DNA sequence analysis revealed that none of these cDNAclones corresponded to the length of the observed transcript, but it waspossible to derive a consensus sequence based on overlapping regions.Additional cDNA clones corresponding to the 5' and 3' ends of thetranscript were derived from 5' and 3' primer-extension experiments.Together, these clones span a total of about 6.1 kb and contain an ORFcapable of encoding a polypeptide of 1480 amino acid residues (FIG. 1).

It was unusual to observe that most of the cDNA clones isolated herecontained sequence insertions at various locations of the restrictionmap of FIG. 7. The map details the genomic structure of the CF gene.Exon intron boundaries are given where all cDNA clones isolated areschematically represented on the upper half of the figure. Many of theseextra sequences clearly corresponded to intron regions reverselytranscribed during the construction of the cDNA, as revealed uponalignment with genomic DNA sequences.

Since the number of recombinant cDNA clones for the CF gene detected inthe library screening was much less than would have been expected fromthe abundance of transcript estimated from RNA hybridizationexperiments, it seemed probable that the clones that contained aberrantstructures were preferentially retained while the proper clones werelost during propagation. Consistent with this interpretation, poorgrowth was observed for the majority of the recombinant clones isolatedin this study, regardless of the vector used.

The procedures used to obtain the 5' and 3' ends of the cDNA weresimilar to those described (M. Frohman et al, Proc. Nat. Acad. Sci, USA,85, 8998-9002, 1988). For the 5' end clones, total pancreas and T84 polyA+RNA samples were reverse transcribed using a primer, (10b), which isspecific to exon 2 similarly as has been described for the primerextension reaction except that radioactive tracer was included in thereaction. The fractions collected from an agarose bead column of thefirst strand synthesis were assayed by polymerase chain reaction (PCR)of eluted fractions. The oligonucleotides used were within the 10-1sequence (145 nucleotides apart) just 5' of the extension primer. Theearliest fractions yielding PCR product were pooled and concentrated byevaporation and subsequently tailed with terminal deoxynucleotidyltransferase (BRL Labs.) and dATP as recommended by the supplier (BRLLabs). A second strand synthesis was then carried out with TaqPolymerase (Cetus, AmpliTaq™) using an oligonucleotide containing atailed linker sequence 5'-CGGAATTCTCGAGATC(T)₁₂ 3' SEQ ID NO:5.

Amplification by an anchored (PCR) experiment using the linker sequenceand a primer just internal to the extension primer which possessed theEco RI restriction site at its 5' end was then carried out. Followingrestriction with the enzymes Eco RI and Bgl II and agarose gelpurification size selected products were cloned into the plasmidBluescript KS available from Stratagene by standard procedures (Maniatiset al, supra). Essentially all of the recovered clones contained insertsof less than 350 nucleotides. To obtain the 3' end clones, first strandcDNA was prepared with reverse transcription of 2 λg T84 poly A RNAusing the tailed linker oligonucleotide previously described withconditions similar to those of the primer extension. Amplification byPCR was then carried out with the linker oligonucleotide and threedifferent oligonucleotides corresponding to known sequences of cloneT16-4.5. A preparative scale reaction (2×100 ul) was carried out withone of these oligonucleotides with the sequence 5'ATGAAGTCCAAGGATTTAG3'SEQ ID NO: 6.

This oligonucleotide is approximately 70 nucleotides upstream of a HindIII site within the known sequence of T16-4.5. Restriction of the PCRproduct with Hind III and Xho l was followed by agarose gel purificationto size select a band at 1.0-1.4 kb. This product was then cloned intothe plasmid Bluescript KS available from Stratagene. Approximately 20%of the obtained clones hybridized to the 3' end portion of T16-4.5.10/10 of plasmids isolated from these clones had identical restrictionmaps with insert sizes of approx. 1.2 kb. All of the PCR reactions werecarried out for 30 cycles in buffer suggested by an enzyme supplier.

An extension primer positioned 157 nt from the 5'end of 10-1 clone wasused to identify the start point of the putative CF transcript. Theprimer was end labelled with γ[³² P]ATP at 5000 Curies/mole and T4polynucleotide kinase and purified by spun column gel filtration. Theradiolabeled primer was then annealed with 4-5 ug poly A+RNA preparedfrom T-84 colonic carcinoma cells in 2× reverse transcriptase buffer for2 hrs. at 60° C. Following dilution and addition of AMV reversetranscriptase (Life Sciences, Inc.) incubation at 41° C. proceeded for 1hour. The sample was then adjusted to 0.4M NaOH and 20 mM EDTA, andfinally neutralized, with NH₄ OAc, pH 4.6, phenol extracted, ethanolprecipitated, redissolved in buffer with formamide, and analyzed on apolyacrylamide sequencing gel. Details of these methods have beendescribed (Meth. Enzymol. 152, 1987, Ed. S. L. Berger, A. R. Kimmel,Academic Press, N.Y.).

Results of the primer extension experiment using an extensionoligonucleotide primer starting 157 nucleotides from the 5' end of 10-1is shown in Panel A of FIG. 10. End labelled φX174 bacteriophagedigested with Hae III (BRL Labs) is used as size marker. Two majorproducts are observed at 216 and 100 nucleotides. The sequencecorresponding to 100 nucleotides in 10-1 corresponds to a very GC richsequence (11/12) suggesting that this could be a reverse transcriptasepause site. The 5' anchored PCR results are shown in panel B of FIG. 10.The 1.4% agarose gel shown on the left was blotted and transferred toZetaprobe™ membrane (Bio-Rad Lab). DNA gel blot hybridization withradiolabeled 10-1 is shown on the right. The 5' extension products areseen to vary in size from 170-280 nt with the major product at about 200nucleotides. The PCR control lane shows a fragment of 145 nucleotides.It was obtained by using the test oligomers within the 10-1 sequence.The size markers shown correspond to sizes of 154, 220/210, 298, 344,394 nucleotides (1 kb ladder purchased from BRL Lab).

The schematic shown below Panel B of FIG. 10 outlines the procedure toobtain double stranded cDNA used for the amplification and cloning togenerate the clones PA3-5 and TB2-7 shown in FIG. 7. The anchored PCRexperiments to characterize the 3'end are shown in panel C. As depictedin the schematic below FIG. 10C, three primers whose relative positionto each other were known were used for amplification with reversedtranscribed T84 RNA as described. These products were separated on a 1%agarose gel and blotted onto nylon membrane as described above. DNA-blothybridization with the 3' portion of the T16-4.5 clone yielded bands ofsizes that corresponded to the distance between the specific oligomerused and the 3'end of the transcript. These bands in lanes 1, 2a and 3are shown schematically below Panel C in FIG. 10. The band in lane 3 isweak as only 60 nucleotides of this segment overlaps with the probeused. Also indicated in the schematic and as shown in the lane 2b is theproduct generated by restriction of the anchored PCR product tofacilitate cloning to generate the THZ-4 clone shown in FIG. 7.

DNA-blot hybridization analysis of genomic DNA digested with EcoRI andHindIII enzymes probed with portions of cDNAs spanning the entiretranscript suggest that the gene contains at least 26 exons numbered asRoman numerals I through XXVI (see FIG. 9). These correspond to thenumbers 1 through 26 shown in FIG. 7. The size of each band is given inkb.

In FIG. 7, open boxes indicate approximate positions of the 24 exonswhich have been identified by the isolation of >22 clones from thescreening of cDNA libraries and from anchored PCR experiments designedto clone the 5' and 3' ends. The lengths in kb of the Eco RI genomicfragments detected by each exon is also indicated. The hatched boxes inFIG. 7 indicate the presence of intron sequences and the stippled boxesindicate other sequences. Depicted in the lower left by the closed boxis the relative position of the clone H1.6 used to detect the first cDNAclone 10-1 from among 10⁶ phage of the normal sweat gland library. Asshown in FIGS. 4(D) and 7, the genomic clone H1.6 partially overlapswith an EcoRI fragment of 4.3 kb. All of the cDNA clones shown werehybridized to genomic DNA and/or were fine restriction mapped. Examplesof the restriction sites occurring within the cDNAs and in thecorresponding genomic fragments are indicated.

With reference to FIG. 9, the hybridization analysis includes probes;i.e., cDNA clones 10-1 for panel A, T16-1 (3' portion) for panel B,T16-4.5 (central portion) for panel C and T16-4.5 (3' end portion) forpanel D. In panel A of FIG. 9, the cDNA probe 10-1 detects the genomicbands for exons I through VI. The 3' portion of T16-1 generated by NruIrestriction detects exons IV through XIII as shown in Panel B. Thisprobe partially overlaps with 10-1. Panels C and D, respectively, showgenomic bands detected by the central and 3' end EcoRI fragments of theclone T16-4.5. Two EcoRI sites occur within the cDNA sequence and splitexons XIII and XIX. As indicated by the exons in parentheses, twogenomic EcoRI bands correspond to each of these exons. Crosshybridization to other genomic fragments was observed. These bands,indicated by N, are not of chromosome 7 origin as they did not appear inhuman-hamster hybrids containing human chromosome 7. The faint band inpanel D indicated by XI in brackets is believed to be caused by thecross-hybridization of sequences due to internal homology with the cDNA.

Since 10-1 detected a strong band on gel blot hybridization of RNA fromthe T-84 colonic carcinoma cell line, this cDNA was used to screen thelibrary constructed from that source. Fifteen positives were obtainedfrom which clones T6, T6/20, T11, T16-1 and T13-1 were purified andsequenced. Rescreening of the same library with a 0.75 kb Bam HI-Eco RIfragment from the 3' end of T16-1 yielded T16-4.5. A 1.8 kb EcoRIfragment from the 3' end of T16-4.5 yielded T8-B3 and T12a, the latterof which contained a polyadenylation signal and tail. Simultaneously ahuman lung cDNA library was screened; many clones were isolatedincluding those shown here with the prefix `CDL`. A pancreas library wasalso screened, yielding clone CDPJ5.

To obtain copies of this transcript from a CF patient, a cDNA libraryfrom RNA of sweat gland epithelial cells from a patient was screenedwith the 0.75 kb Bar HI-Eco RI fragment from the 3' end of T16-1 andclones C16-1 and C1-1/5, which covered all but exon 1, were isolated.These two clones both exhibit a 3 bp deletion in exon 10 which is notpresent in any other clone containing that exon. Several clones,including CDLS26-1 from the lung library and T6/20 and T13-1 isolatedfrom T84 were derived from partially processed transcripts. This wasconfirmed by genomic hybridization and by sequencing across theexon-intron boundaries for each clone. T11 also contained additionalsequence at each end. T16-4.5 contained a small insertion near theboundary between exons 10 and 11 that did not correspond to intronsequence. Clones CDLS16A, 11a and 13a from the lung library alsocontained extraneous sequences of unknown origin. The clone C16-1 alsocontained a short insertion corresponding to a portion of theγ-transposon of E. coli; this element was not detected in the otherclones. The 5' clones PA3-5, generated from pancreas RNA and TB2-7generated from T84 RNA using the anchored PCR technique have identicalsequences except for a single nucleotide difference in length at the 5'end as shown in FIG. 1. The 3' clone, THZ-4 obtained from T84 RNAcontains the 3' sequence of the transcript in concordance with thegenomic sequence of this region.

A combined sequence representing the presumptive coding region of the CFgene was generated from overlapping cDNA clones. Since most of the cDNAclones were apparently derived from unprocessed transcripts, furtherstudies were performed to ensure the authenticity of the combinedsequence. Each cDNA clone was first tested for localization tochromosome 7 by hybridization analysis with a human-hamster somatic cellhybrid containing a single human chromosome 7 and by pulsed field gelelectrophoresis. Fine restriction enzyme mapping was also performed foreach clone. While overlapping regions were clearly identifiable for mostof the clones, many contained regions of unique restriction patterns.

To further characterize these cDNA clones, they were used as probes ingel hybridization experiments with EcoRI- or HindIII-digested humangenomic DNA. As shown in FIG. 9, five to six different restrictionfragments could be detected with the 10-1 cDNA and a similar number offragments with other cDNA clones, suggesting the presence of multipleexons for the putative CF gene. The hybridization studies alsoidentified those cDNA clones with unprocessed intron sequences as theyshowed preferential hybridization to a subset of genomic DNA fragments.For the confirmed cDNA clones, their corresponding genomic DNA segmentswere isolated and the exons and exon/intron boundaries sequenced. Asindicated in FIG. 7, at least 27 exons have been identified whichincludes split exons 6a, 6b, 14a, 14b and 17a, 17b. Based on thisinformation and the results of physical mapping experiments, the genelocus was estimated to span 250 kb on chromosome 7.

2.6 The Sequence

FIG. 1 shows the nucleotide sequence SEQ ID NO: 1 of the cloned cDNAencoding CFTR together with the deduced amino acid sequence SEQ ID NO:2. The first base position corresponds to the first nucleotide in the 5'extension clone PA3-5 which is one nucleotide longer than TB2-7. Arrowsindicate position of transcription initiation site by primer extensionanalysis. Nucleotide 6129 is followed by a poly(da) tract. Positions ofexon junctions are indicated by vertical lines. Potentialmembrane-spanning segments were ascertained using the algorithm ofEisenberg at al J. Mol. Biol. 179:125 (1984). Potentialmembrane-spanning segments as analyzed and shown in FIG. 11 are enclosedin boxes of FIG. 1. In FIG. 11, the mean hydropathy index [Kyte andDoolittle, J. Molec. Biol. 157: 105, (1982)] of 9 residue peptides isplotted against the amino acid number. The corresponding positions offeatures of secondary structure predicted according to Garnier et al,(J. Molec. Biol. 157, 165 (1982)] are indicated in the lower panel.Amino acids comprising putative ATP-binding folds are underlined inFIG. 1. Possible sites of phosphorylation by protein kinases A (PKA) orC (PKC) are indicated by open and closed circles, respectively. The opentriangle is over the 3 bp (CTT) which are deleted in CF (see discussionbelow). The cDNA clones in FIG. 1 were sequenced by the dideoxy chaintermination method employing ³⁵ S labelled nucleotides by the DupontGenesis 2000™ automatic DNA sequencer.

The combined cDNA sequence spans 6129 base pairs excluding the poly(A)tail at the end of the 3' untranslated region and it contains an ORFcapable of encoding a polypeptide of 1480 amino acids (FIG. 1). An ATG(AUG) triplet is present at the beginning of this ORF (base position133-135). Since the nucleotide sequence surrounding this codon(5'-AGACCAUGCA-3') SEQ ID NO: 7 has the proposed features of theconsensus sequence (CC) A/GCCAUGG(G) of an eukaryotic translationinitiation site with a highly conserved A at the -3 position, it ishighly probable that this AUG-corresponds to the first methionine codonfor the putative polypeptide.

To obtain the sequence corresponding to the 5' end of the transcript, aprimer-extension experiment was performed, as described earlier. Asshown in FIG. 10A, a primer extension product of approximately 216nucleotides could be observed suggesting that the 5' end of thetranscript initiated approximately 60 nucleotides upstream of the end ofcDNA clone 10-1. A modified polymerase chain reaction (anchored PCR) wasthen used to facilitate cloning of the 5'-end sequences (FIG. 10b) Twoindependent 5'-extension clones, one from pancreas and the other fromT84 RNA, were characterized by DNA sequencing and were found to differby only 1 base in length, indicating the most probable initiation sitefor the transcript as shown in FIG. 1.

Since most of the initial cDNA clones did not contain a polyA tailindicative of the end of a mRNA, anchored PCR was also applied to the 3'end of the transcript (Frohman et al, 1988, supra). Three 3'-extensionoligonucleotides were made to the terminal portion of the cDNA cloneT16-4.5. As shown in FIG. 10c, 3 PCR products of different sizes wereobtained. All were consistent with the interpretation that the end ofthe transcript was approximately 1.2 kb downstream of the HindIII siteat nucleotide position 5027 (see FIG. 1). The DNA sequence derived fromrepresentative clones was in agreement with that of the T84 cDNA cloneT12a (see FIG. 1 and 7) and the sequence of the corresponding 2.3 kbEcoRI genomic fragment.

3.0 Molecular Genetics of CF

3.1 Sites of Expression

To visualize the transcript for the putative CF gene, RNA gel blothybridization experiments were performed with the 10-1 cDNA as probe.The RNA hybridization results are shown in FIG. 8.

RNA samples were prepared from tissue samples obtained from surgicalpathology or at autopsy according to methods previously described (A. M.Kimmel, S. L. Berger, eds. Meth. Enzymol. 152, 1987). Formaldehyde gelswere transferred onto nylon membranes (Zetaprobe™; BioRad Lab). Themembranes were then hybridized with DNA probes labeled to high specificactivity by the random priming method (A. P. Feinberg and B. Vogelstein,Anal. Biochem. 132, 6, 1983) according to previously publishedprocedures (J. Rommens et al, Am. J. Hum. Genet. 43, 645-663, 1988).FIG. 8 shows hybridization by the cDNA clone 10-1 to a 6.5 kb transcriptin the tissues indicated. Total RNA (10 μg) of each tissue, and Poly A+RNA (1 μg) of the T84 colonic carcinoma cell line were separated on a 1%formaldehyde gel. The positions of the 28S and 18S rRNA bands areindicated. Arrows indicate the position of transcripts. Sizing wasestablished by comparison to standard RNA markers (BRL Labs). HL60 is ahuman promyelocytic leukemia cell line, and T84 is a human colon cancercell line.

Analysis reveals a prominent band of approximately 6.5 kb in size in T84cells. Similar, strong hybridization signals were also detected inpancreas and primary cultures of cells from nasal polyps, suggestingthat the mature mRNA of the putative CF gene is approximately 6.5 kb.Minor hybridization signals, probably representing degradation products,were detected at the lower size ranges but they varied between differentexperiments. Identical results were obtained with other cDNA clones asprobes. Based on the hybridization band intensity and comparison withthose detected for other transcripts under identical experimentalconditions, it was estimated that the putative CF transcriptsconstituted approximately 0.01% of total mRNA in T84 cells.

A number of other tissues were also surveyed by RNA gel blothybridization analysis in an attempt to correlate the expression patternof the 10-1 gene and the pathology of CF. As shown in FIG. 8,transcripts, all of identical size, were found in lung, colon, sweatglands (cultured epithelial cells), placenta, liver, and parotid glandbut the signal intensities in these tissues varied among differentpreparations and were generally weaker than that detected in thepancreas and nasal polyps. Intensity varied among differentpreparations, for example, hybridization in kidney was not detected inthe preparation shown in FIG. 8, but can be discerned in subsequentrepeated assays. No hybridization signals could be discerned in thebrain or adrenal gland (FIG. 8), nor in skin fibroblast and lymphoblastcell lines.

In summary, expression of the CF gene appeared to occur in many of thetissues examined, with higher levels in those tissues severely affectedin CF. While this epithelial tissue-specific expression pattern is ingood agreement with the disease pathology, no significant difference hasbeen detected in the amount or size of transcripts from CF and controltissues, consistent with the assumption that CF mutations are subtlechanges at the nucleotide level.

3.2 The Major CF Mutation

FIG. 16 shows the DNA sequence at the F508 deletion. On the left, thereverse complement of the sequence from base position 1649-1664 of thenormal sequence (as derived from the cDNA clone T16). The nucleotidesequence is displayed as the output (in arbitrary fluorescence intensityunits, y-axis) plotted against time (x-axis) for each of the 2photomultiplier tubes (PMT#1 and #2) of a Dupont Genesis 2000™ DNAanalysis system. The corresponding nucleotide sequence is shownunderneath. On the right is the same region from a mutant sequence (asderived from the cDNA clone C16). Double-stranded plasmid DNA templateswere prepared by the alkaline lysis procedure. Five μg of plasmid DNAand 75 ng of oligonucleotide primer were used in each sequencingreaction according to the protocol recommended by Dupont except that theannealing was done at 45° C. for 30 min and that theelongation/termination step was for 10 min at 42° C. The unincorporatedfluorescent nucleotides were removed by precipitation of the DNAsequencing reaction product with ethanol in the presence of 2.5 Mammonium acetate at pH 7.0 and rinsed one time with 70% ethanol. Theprimer used for the T16-1 sequencing was a specific oligonucleotide5'GTTGGCATGCTTTGATGACGCTTC3' SEQ ID NO: 8 spanning bass position1708-1731 and that for C16-1 was the universal primer SK for theBluescript vector (Stratagene).

FIG. 17 also shows the DNA sequence around the F508 deletion, asdetermined by manual sequencing. The normal sequence from base position1726-1651 (from cDNA T16-1) is shown beside the CF sequence (from cDNAC16-1). The left panel shows the sequences from the coding strandsobtained with the B primer (5'GTTTTCCTGGATTATGCCTGGCAC3') SEQ ID NO: 9and the right panel those from the opposite strand with the D primer(5'GTTGGCATGCTTTGATGACGCTTC3') SEQ ID NO: 8. The brackets indicate thethree nucleotides in the normal that are absent in CF (arrowheads).Sequencing was performed as described in F. Sanger, S. Nicklen, A. R.Coulsen, Proc. Nat. Acad. Sci. U.S.A. 74: 5463 (1977).

The extensive genetic and physical mapping data have directed molecularcloning studies to focus on a small segment of DNA on chromosome 7.Because of the lack of chromosome deletions and rearrangements in CF andthe lack of a well-developed functional assay for the CF gene product,the identification of the CF gene required a detailed characterizationof the locus itself and comparison between the CF and normal (N)alleles. Random, phenotypically normal, individuals could not beincluded as controls in the comparison due to the high frequency ofsymptomless carriers in the population. As a result, only parents of CFpatients, each of whom by definition carries an N and a CF chromosome,were suitable for the analysis. Moreover, because of the strong allelicassociation observed between CF and some of the closely linked DNAmarkers, it was necessary to exclude the possibility that sequencedifferences detected between N and CF were polymorphisms associated withthe disease locus.

3.3 Identification of RFLPs and Family Studies

To determine the relationship of each of the DNA segments isolated fromthe chromosome walking and jumping experiments to CF, restrictionfragment length polymorphisms (RFLPs) were identified and used to studyfamilies where crossover events had previously been detected between CFand other flanking DNA markers. As shown in FIG. 14, a total of 18 RFLPswere detected in the 500 kb region; 17 of them (from E6 to CE1.0) listedin Table 2; some of them correspond to markers previously reported.

Five of the RFLPs, namely 10-1X.6, T6/20, H1.3 and CE1.0, wereidentified with cDNA and genomic DNA probes derived from the putative CFgene. The RFLP data are presented in Table 2, with markers in the METand D7S8 regions included for comparison. The physical distances betweenthese markers as well as their relationship to the MET and D7S8 regionsare shown in FIG. 14.

                                      TABLE 2                                     __________________________________________________________________________    RFLPs ASSOCIATED WITH THE CF GENE                                             Probe     Frag-                                                                 name Enzyme length N.sup.(a)   CF-PI.sup.(a) A.sup.(b) *.sup.(c)                                        Reference                                         __________________________________________________________________________    metD BanI 7.6(kb)                                                                            28 48  0.60                                                                             0.10                                                                             J. E. Spence et al, Am.                                    J. Hum. Genet 39:729                                                          (1986)                                                                  6.8 59 25                                                                    IDetD TaqI 6.2 74 75 0.66 0.06 R. White et al,                                       Nature 318:382                                                                (1985                                                                   4.8 19 4                                                                     metH TaqI 7.5 45 49 0.35 0.05 White et al,                                           supra                                                                   4.0 38 20                                                                    E6 TaqI 4.4 58 62 0.45 0.06 B. Keren et al,a                                         Am. J. Hum. Genet.                                                            44:827 (1989)                                                           3.6 42 17                                                                    E7 TaqI 3.9 40 16 0.47 0.07                                                     3 + 0.9 51 57                                                               pH131 HinfI 0.4 81 33 0.73 0.15 J. M. Rommens                                  et al, Am. J. Hum.                                                            Genet. 43:645                                                                 (1988)                                                                         0.3 18 47                                                                   W3D1.4 HindIII 20 82 33 0.68 0.13 B. Kerem et al,                              supra                                                                          10 22 47                                                                    H2.3A TaqI 2.1 39 53 0.64 0.09 X. Estivill                                     et al, Nature                                                                 326:840 (1987);                                                               X. Estivill et al,                                                            Genomics 1:257 (1987)                                                        (XV2C)  1.4 37 11                                                             EG1.4 HincII 3.8 31 69 0.89 0.17                                                2.8 56 7                                                                    EG1.4 BgII 20 27 69 0.89 0.18                                                  15 62 9                                                                      JG2E1 PstI 7.8 69 10 0.88 0.18 X. Estivill                                     et al supra                                                                   and B. Kerem et al                                                            supra (KM19)                                                                  6.63070                                                                      E2.6/E.9 MspI 13 34 6 0.85 0.14                                                8.5 26 55                                                                    H2.8A NcoI 25 22 55 0.87 0.18                                                 8 52 9                                                                        E4.1 MspI 12 37 8 0.77 0.11 G. Romeo, personal                                 communication                                                                (Mp6d9)  8.5 + 3.5 38 64                                                      J44 XbaI 15.3 40 70 0.86 0.13                                                   15 + .3 44 6                                                                10-1X.6 AccI 6.5 67 15 0.90 0.24                                                3.5 + 3 14 60                                                               10-1X.6 IIaeIII 1.2 14 61 0.91 0.25                                            72 15                                                                        T6/20 MspI 8 56 66 o.51 0.54                                                   4.3 21 8                                                                     H1.3 NcoI 2.4 53 7 0.87 o.15                                                   1 + 1.4 35 69                                                                CE1.0 NdeI 5.5 81 73 0.41 0.03                                                 4.7 + 0.8 8 3                                                                J32 SacI 15 21 24 0.17 0.02 M. C. Iannuzi et al                                Am. J. Genet.                                                                 44:695 (1989)                                                                 6 47 38                                                                      J3.11 MspI 4.2 36 38 0.29 0.04 B. J. Wainright et al,                          Nature                                                                        318:384 (1985)                                                                1.8 62 36                                                                    J29 PvuII 9 26 36 0.36 0.06 M. C. Iannuzi                                      et al, supra                                                                  6 55 36                                                                    __________________________________________________________________________     NOTES FOR TABLE 2                                                             .sup.(a) The number of N and CFPI (CF with pancreatic insufficiency)          chromosomes were derived from the parents in the families used in linkage     analysis [Tsui et al, Cold Spring Harbor Symp. Quant. Biol. 51:325            (1986)].                                                                      .sup.(b) Standardized association (A), which is less influenced by the        fluctuation of DNA marker allele distribution among the N chromosomes, is     used here for the comparison Yule's association coefficient A = (ad -         bc)/(ad + bc), where a, b, c, and d are the number of N chromosomes with      DNA marker allele 1, CF with 1, N with 2, and CF with 2 respectively.         Relative risk can be calculated using the relationship RR = (1 + A)/(1 -      A) or its reverse.                                                            .sup.(c) Allelic association (*), calculated according to A. Chakravarti      et al, Am. J. Hum. Genet. 36:1239, (1984) assuming the frequency of 0.02      for CF chromosomes in the population is included for comparison.         

Because of the small number of recombinant families available for theanalysis, as was expected from the close distance between the markersstudied and CF, and the possibility of misdiagnosis, alternativeapproaches were necessary in further fine mapping of the CF gene.

3.4 Allelic Association

Allelic association (linkage disequilibrium) has been detected for manyclosely linked DNA markers. While the utility of using allelicassociation for measuring genetic distance is uncertain, an overallcorrelation has been observed between CF and the flanking DNA markers. Astrong association with CF was noted for the closer DNA markers, D7S23and D7S122, whereas little or no association was detected for the moredistant markers MET, D7S8 or D7S424 (see FIG. 1).

As shown in Table 2, the degree of association between DNA markers andCF (as measured by the Yule's association coefficient) increased from0.35 for metH and 0.17 for J32 to 0.91 for 10-1X.6 (only CF-PI patientfamilies were used in the analysis as they appeared to be geneticallymore homogeneous than CF-PS). The association coefficients appeared tobe rather constant over the 300 kb from EG1.4 to H1.3; the fluctuationdetected at several locations, most notably at H2.3A, E4.1 and T6/20,were probably due to the variation in the allelic distribution among theN chromosomes (see Table 2). These data are therefore consistent withthe result from the study of recombinant families (see FIG. 14). Asimilar conclusion could also be made by inspection of the extended DNAmarker haplotypes associated with the CF chromosomes (see below).However, the strong allelic association detected over the large physicaldistance between EG1.4 and H1.3 did not allow further refined mapping ofthe CF gene. Since J44 was the last genomic DNA clone isolated bychromosome walking and jumping before a cDNA clone was identified, thestrong allelic association detected for the JG2E1-J44 interval promptedus to search for candidate gene sequences over this entire interval. Itis of interest to note that the highest degree of allelic associationwas, in fact, detected between CF and the 2 RFLPs detected by 10-1X.6, aregion near the major CF mutation.

Table 3 shows pairwise allelic association between DNA markers closelylinked to CF. The average number of chromosomes used in thesecalculations was 75-80 and only chromosomes from CF-PI families wereused in scoring CF chromosomes. Similar results were obtained whenYule's standardized association (A) was used.

    TABLE 3       - N chromosomes       metD metH E6 E7 pH131 W3D1.4 H2.3A EG1.4 JG2E1 E2.6 H2.8 E4.1 J44     10-1X.6 T6/20 H1.3 CE1.0 J32 J3.11 J29       BanI TaqI TaqI TaqI TaqI HinfI IdIII TaqI HCII BgII PstI MspI NcoI     MspI SbaI AccI HaeIII MspI NcoI NdeI SacI MspI PvuII       CF chromosomes       metD BanI  -- 0.35 0.40 0.04 0.04 0.05 0.07 0.27 0.06 0.06 0.07 0.14     0.07 0.09 0.03 0.06 0.10 0.03 0.16 0.05 0.07 0.11 0.02       metD TaqI  0.21 -- 0.41 0.13 0.15 0.02 0.01 0.02 0.09 0.15 0.11 0.07     0.24 0.03 0.11 0.08 0.02 0.06 0.13 0.14 0.09 0.09 0.05       metH TaqI    0.81 0.14 -- 0.01 0.05 0.06 0.06 0.24 0.05 0.08 0.07 0.13       0.15 0.07 0.04 0.02 0.02 0.07 0.02 0.03 0.21 0.04 0.18       E6 TaqI    0.11 0.30 0.00 -- 0.93 0.07 0.06 0.04 0.02 0.03 0.00 0.19     0.02 0.09 0.19 0.09 0.11 0.09 0.15 0.07 0.11 0.20 0.00       E7 TaqI    0.16 0.31 0.02 1.00 -- 0.11 0.09 0.03 0.03 0.04 0.01 0.11     0.00 0.07 0.22 0.01 0.02 0.09 0.13 0.06 0.06 0.16 0.04       pH 131 HinfI    0.45 0.28 0.23 0.38 0.40 -- 0.91 0.12 0.04 0.09 0.05     0.06 0.03 0.03 0.06 0.16 0.15 0.20 0.04 0.03 0.06 0.08 0.06       W3D1.4 HindIII   0.45 0.28 0.23 0.45 0.47 0.95 -- 0.21 0.02 0.03 0.01     0.06 0.03 0.03 0.10 0.12 0.10 0.23 0.10 0.05 0.05 0.10 0.06       H2.3A TaqI    0.20 0.11 0.15 0.08 0.11 0.38 0.47 -- 0.05 0.11 0.07     0.42 0.14 0.29 0.07 0.27 0.22 0.20 0.09 0.23 0.04 0.08 0.12       EG1.4 HincII   0.11 0.08 0.07 0.06 0.07 0.20 0.20 0.24 -- 0.96 0.87     0.76 0.86 0.81 0.60 0.07 0.13 0.61 0.56 0.04 0.24 0.14 0.15       EG1.4 BgII    0.03 0.06 0.07 0.08 0.07 0.27 0.27 0.40 1.00 -- 0.92     0.77 0.93 0.71 0.55 0.08 0.07 0.56 0.55 0.12 0.28 0.24 0.20       JG2E1 PstI    0.07 0.06 0.03 0.09 0.06 0.30 0.30 0.45 0.93 0.94 --     0.84 1.00 0.76 0.64 0.11 0.11 0.61 0.57 0.13 0.31 0.26 0.22       E2.6/E.9 MspI    0.22 0.06 0.07 0.02 0.03 0.20 0.20 0.34 0.81 0.82     0.92 -- 0.83 0.97 0.76 0.56 0.52 0.47 0.70 0.32 0.31 0.25 0.22       H2.8 HCol    0.05 0.07 0.01 0.06 0.06 0.31 0.31 0.45 0.92 0.93 1.00     0.92 -- 0.74 0.65 0.13 0.18 0.60 0.59 0.10 0.28 0.28 0.18       E4.1 MspI    0.12 0.06 0.07 0.05 0.03 0.25 0.24 0.48 0.82 0.86 0.94     1.00 0.93 -- 0.71 0.49 0.49 0.49 0.68 0.34 0.27 0.25 0.21       J44 Xbal    0.18 0.05 0.06 0.01 0.01 0.26 0.26 0.45 0.71 0.69 0.80     0.90 0.80 0.85 -- 0.33 0.40 0.65 0.64 0.32 0.24 0.22 0.23       10-1X.6 AccI    0.16 0.10 0.24 0.10 0.11 0.42 0.42 0.64 0.54 0.58 0.64       0.70 0.69 0.69 0.59 -- 0.91 0.19 0.35 0.46 0.00 0.02 0.03       10-1X.6 HaeIII   0.16 0.10 0.24 0.08 0.11 0.41 0.41 0.65 0.54 0.58     0.64 0.70 0.69 0.69 0.59 1.00 -- 0.18 0.43 0.52 0.02 0.02 0.08       T6/20 MspI    0.27 0.07 0.36 0.13 0.13 0.23 0.23 0.29 0.05 0.00 0.01     0.07 0.02 0.01 0.11 0.69 0.69 -- 0.56 0.03 0.21 0.18 0.25       H1.3 NcoI    0.06 0.06 0.06 0.03 0.01 0.30 0.30 0.55 0.71 0.78 0.87     0.90 0.76 0.93 0.92 0.64 0.64 0.12 -- 0.40 0.19 0.13 0.20       CE1.0 NdeI    0.00 0.04 0.02 0.11 0.11 0.25 0.25 0.08 0.69 0.59 0.55     0.43 0.55 0.37 0.44 0.24 0.24 0.07 0.40 -- 0.19 0.20 0.14       J32 SacI    0.03 0.13 0.07 0.17 0.13 0.17 0.24 0.07 0.21 0.21 0.24     0.22 0.24 0.21 0.21 0.27 0.26 0.13 0.21 0.18 -- 0.84 0.97       J3.11 MspI    0.14 0.11 0.15 0.07 0.06 0.05 0.05 0.12 0.11 0.10 0.13     0.18 0.19 0.15 0.20 0.28 0.29 0.24 0.14 0.07 0.81 -- 0.71       J29 PvuII    0.11 0.12 0.09 0.10 0.10 0.00 0.00 0.09 0.10 0.10 0.14     0.17 0.20 0.15 0.15 0.29 0.29 0.23 0.15 0.05 0.85 0.97 --

Strong allelic association was also detected among subgroups of RFLPs onboth the CF and N chromosomes. As shown in Table 3, the DNA markers thatare physically close to each other generally appeared to have strongassociation with each other. For example, strong (in some cases almostcomplete) allelic association was detected between adjacent markers E6and E7, between pH131 and W3D1.4 between the AccI and HaeIII polymorphicsites detected by 10-1X.6 and amongst EG1.4, JG2E1, E2.6(E.9), E2.8 andE4.1. The two groups of distal markers in the MET and D7S8 region alsoshowed some degree of linkage disequilibrium among themselves but theyshowed little association with markers from E6 to CE1.0, consistent withthe distant locations for MET and D7S8. On the other hand, the lack ofassociation between DNA markers that are physically close may indicatethe presence of recombination hot spots. Examples of these potential hotspots are the region between E7 and pH131, around H2.3A, between J44 andthe regions covered by the probes 10-1X.6 and T6/20 (see FIG. 14). Theseregions, containing frequent recombination breakpoints, were useful inthe subsequent analysis of extended haplotype data for the CF region.

3.5 Haplotype Analysis

Extended haplotypes based on 23 DNA markers were generated for the CFand N chromosomes in the collection of families previously used forlinkage analysis. Assuming recombination between chromosomes ofdifferent haplotypes, it was possible to construct several lineages ofthe observed CF chromosomes and, also, to predict the location of thedisease locus.

To obtain further information useful for understanding the nature ofdifferent CF mutations, the F508 deletion data were correlated with theextended DNA marker haplotypes. As shown in Table 4, five major groupsof N and CF haplotypes could be defined by the RFLPs within orimmediately adjacent to the putative CF gene (regions 6-8).

                                      TABLE 4                                     __________________________________________________________________________    DNA MARKER HAPLOTYPES SPANNING THE CF LOCUS                                   __________________________________________________________________________                              CF.sup.(b)                                          HAPLOTYPES.sup.(a)        PI  PS  PI  PS                                           1 2  3 4 5 6  7 8  9 (F508)                                                                            (F508)                                                                            others                                                                            others                                                                            N                                   __________________________________________________________________________      I.(a)   A     A      A     A      A      A      A      A      A      10                                                              1 -- --  --                                                        A     A      A     A      A                                                    A      --     A      A                                                    3    -- --  -- --                       A     A      A     A      --     A      A      --     A      1    --                                               --  -- --                                A     A      A     A      --     --     A      --     A      --                                                     -- --  --1                              A     A      A     A      A      A      A      A      B      10                                                     -- --  --1                              A     A      --    A      A      A      A      A      B      4    --                                               --  -- --                                A     A      A     A      --     A      A      A      B      1    --                                               --  -- --                                A     A        --  A      A      A      A      A      C      1    --                                               --  -- --                                B     A      A     A      A      A      A      A      A      4                                                      -- --  -- --                            B     A       --   A      A      A      A      A      A      1                                                      -- --  -- --                            B     A      A     A      --     A      A      A      A      --                                                     1 -- --  --                             B     A      A     A      A      A      A      A      --     1                                                      -- --  -- --                            B     A      A     A      --     --     A      --     A      1    --                                               --  -- --                                A     B      A     A      A      A      A      A      A      1                                                      -- --  -- --                            A     D      A     A      A      A      A      A      A      1    --                                               --  -- --                                A     G      A     A      A      A      A      A      A      1    --                                               --  -- --                                B     B      A     A      A      A      A      A      A      1                                                      -- --  -- --                            B     C      A     A      A      A      A      A      A      2                                                      -- --  -- --                            E     B      A     A      --     --     A      --     A      1                                                      -- --  -- --                            D     B      A     A      --     A      --     A      A      1                                                      -- --  -- --                            D     B      B     A      A      A      A      A      A      1                                                      -- --  -- --                            B     A       --   A      A      A      A      A      B      1    --                                               --  -- --                                C     A      --    A      A      A      A      A      B      1   --                                                --  -- --                                A     D      A     A      A      A      A      A      B     1                                                      -- --  -- --                             D     C       A    A      A      A      A      A      B      .   1                                                 -- --  --                                A     D       A    A      --     A      A      A      B       1                                                     -- --  -- --                            D     D      --    A      A      A      A      A      B       --                                                    -- --  --1                            B    B       --     A     A      A      --     A      B       1                                                      -- --  -- --                             A     B        --  A      A      A      A      A      E       2                                                     -- --  -- --                              A     B       A    A      A      A      A      A      E       1                                                  1 -- --  --                              A     E      B     A      A      A      A      A      E       1                                                     -- --  -- --                            A     C      A     A      A      A      A      A      B       1   --                                               --  -- --                                A     C      --    C      --     A      A      A      B       --                                                    1 -- --  --                             A      B     A     B     A     A      A     --    A      --      --                                                    -- --  1                             B      C     B     A     --    A      A     A/D   B      1      --                                                   -- --  --                           (b)    A       C     --    A     A     A      A     A     A      --                                                    --      --1 --                          A      C     A     A     A     A      A     A     --     --      --                                                    1 -- --                              D      C     --    A     A     A      A     A     B      --      --                                                    1 -- --                              D      C     A     A     A     A      A     A     D      --      --                                                    -- --  1                             F      C     --    A     A     A      A     A     B      --      --                                                    1 -- --                              B      c     A     A     A     A      A     A     B      --     --                                                    3 -- --                            (c)    B      C     A     B     C     A       A     D     A      --                                                    --      -- --  1                        B      C     A     B     C     A      A     D     B      --      --                                                    1 -- --                              F      C     A     B     C     A      A     D     B      --      --                                                    -- --  1                             F      A     A     B     C     A      A     D     B      --     --                                                    -- --  1                              A      B     A     B     C     A      A     D     B      --      --                                                    -- --  1                             B      B     A     B     C     A      A     D     B      --     --                                                    -- --  1                              B      D     A     B     C     A      --    D     C      --     --                                                    -- --  1                              A      B     A     B     A     A      --    D     A      --      --                                                    -- --  1                          (d)         D      B     A     A     A     A      A     C     A      --                                                    --      -- --  1                         B      C     B     C     A     A      A     C     B      --                                                   --      -- --  1                                                                   57     5     7      1                                                    14                                    II.(a)  B      A     --    B     B     B      A     C     B      --                                                    --      1 -- --                                --       B/C   B    B     B     B      A     C     B      --                                                      --    1 -- --                                 B      A     --    B     --    B      A     A/C   B                                                     --     --       --1 --                            A      B     B     B     B     B      A     C     B    --                                                  --     1 -- --                                   B      B     B     B     B     B      A     C     A      --                                                  --     -- --  3                                A      C     B     B     B     B      A     C     A      --                                                    --      -- --  1                               A      C     B     B     B     B      --    C     A                                                     --      --      -- --  1                                                                     F      C     B                                                   B     B     B      A     C                                                   A      --     --       -- --                                                 1                                                   A      C     B     B     B     B      A     C     B                                                     --      --      -- --  1                                                                     A      C     --                                                  B     B     B      --    C                                                   C      --        --    -- --                                                 1                                                   B      C     B     B     --    B      A     C     C                                                     --        --    --1 --                              B      C     B     B     B     B      A     C     B                                                     --        --    -- --  1                                                                     B      C     B                                                   B     B     B      A     C                                                   A      --        --    -- --                                                 1                                      B C B B B B A C D -- --  -- --  1                                                           B      C     --    B     B      B     A     C     B                                                      -- --  --  --1                                     B      C     B     B     B      B     --    C     B                                                      -- --  --   --1                                    D      C     B     B     B      B     A     C     B                                                      -- --  --   --2                                    D      --    B     B     --     B     A     C     B                                                      -- --  --   --  1                                  F      C     B     B     B      B     A     C     B                                                       -- --  --  --  1                                  C      C     --    B     B      B     A     C     B                                                      -- --  --   --  1                                  A      A     A     B     B      B     A     C     B                                                      -- --  --   --  1                                  B      G     A     B     B      B     A     C     B                                                      -- --  --   --  1                                  F      A     --    B     B      B     A     C     B                                                      -- --  --   --  1                                  B      H     --    B     B      B     A     C     B                                                      -- --  --   --  1                                 B      B     --    B     B      B     A     C     B                                                      -- --  --   --  1                                  A      B     A     B     B      B     A     C     B                                                      -- --  --   --  1                                  F      D     A     B     B      B     A     C     B                                                      -- --  --   --  1                                  C      D     A     B     B      B     A     C     A                                                      -- --  --   --  1                                  B      D     A     B     B      B     A     C     A                                                      -- --  --   --  1                     B      C     A     B     B      B     A     C     A        -- --  --                                                 --  2                                               A      C     A     B     B      B     A     C     B                                                      -- --  --   --  1                                A      C     A     B     B      B     --    C     B                                                       -- --  --   --  1                                   A      C     A     B     B      B     A     C     C                                                      -- --  --   --  1                                  B      C     A     B     B      B     A     C     B                                                      -- --  --   --  1                                  D      B/C   --    B     B      B     A     C     A                                                      -- --  --   --  2                                  C      C     A     B     B      B     A     C     A                                                      -- --  --   1   --                          D      B     --    B     B      B     A     A/C   B        -- --                                                --   --  1                                  D      B     A     B     --     B     A     C     B        -- --                                                --   --  1                                  A      G     A     B     B      B     A     C      A       -- --                                                --   --  1                                  B      C     --    B     B      B     A     A/C   A        -- --                                                --   --  1                                        A      C     B     D     B      B     A     C     B                                                      -- --  1  --  --                                   A      C     --    D     --     B     A     C    B                                                       -- --  --   --    1                                B      B     B     E     B      B     A     C     C                                                      -- --  --   --    1                                F      D     A     B     B      B     A     C     C                                                      -- --  --   1   --                                 A      A     A     A     A      B     A     C     D                                                      -- --  --   --      1                              --     B/C   A     B     C      B     A     C     B                                                      -- --  --   --      1                  A B A B B B/C A C A -- --  -- --  1                                          (b)     A   C      A     B       B   B  A     B      E -- --  1 -- --                                                       A       C     --    B                                                   B       BA     B      B    --                                                 --  1 -- --                           (c)    B        D     --    B     --      BA     A      A                                                                                  -- --   --                                               --   1                                          0 0        6     4      45                                          III.(a)    B       C      B     A       A     C       B     A      B                                                   1 -- --  -- --                       (b)     B       A      B     A       A     C       B     A      B                                                       -- --  1 -- --                           B       C      B     A       A     C       B     A        A                                                      -- --  1 -- --                             B       C      B       A       A     C       B     A      B                                                            -- --  -- --  1                      B       C      --      A      A     C       B     A      B                                                             -- --  -- --  2                                A       B      --      A       A     C       B     A  B                                                          -- --  -- --  1                                                                     A       B                                                     --      A       A     C                                                       B    A     C               --                                                 --  -- --  1                                         B       B        B     A       A     C      B        A                                                    B                   -- --                                                  -- --  2                                             D       C      B      A       A     C       B     A                                                    A   -- --  --1 --                                    A       B      B       C       A     C       B     A                                                    B         -- --  -- --  1                                                      B B A A A C B A B -- --  2                                                  --1                                    B B -- AACBAB -- -- 1 --1                                                           B       B      A     A       A     C       B     A      A  -- --                                                -- 1 --                                D       A      A     A       A     C       B     A  B         -- --                                                 -- --  1                                    D       C      A     A       A     C       B     A      B     --                                                --  -- --  2                                 A       C      --    A       A     C       B     A      B                                                       -- --  1 --1                                D        B      A    A       A     C       --    A      C                                                         -- --  -- --  1                    (c)    A       A      A     B       B     C       B     A    --                                                       -- --  1 -- --                                F       B      B     B       B     C       B     A      B                                                        -- --  -- --  1                           D       B      B     B       B     C       B     A      A                                                         . -- --  --1                                 1 0 7 2 17                                                          IV.     F       C      B     A       A     C       B     C      A -- --                                                -- 1 --                                   B       C      A     A       A     C       B     C      --     --                                                   --      --    --     1                                                          A       B      A     A                                                     A     C       --    C      B                                                 -- --  -- --  1                        A H B A --C --C  -- --  -- --  1                                             D B B B B C B C B -- --  -- --  1                                             0 0 0 1 4                                                                    V.(a) B C B B B C A C A -- --  1 -- --                                         A C B B -- --  A --A -- --  1 -- --                                           B B B B B C A C B -- --  -- --  1                                             B C B B B C A C B -- --  -- --  1                                             B C --B B C A C B -- --  -- --  1                                             D --A B B C --C B -- --  -- --  1                                            (b) B C A B C C A C A -- --  --1 --                                            B C --B C C --C D -- --  -- --  1                                                      0 0 2 1 5                                                           Others B C B A A B B A B -- --  -- --  1                                       B C B A A D B A B -- --  -- --  1                                             B C B E B A B D A -- --  -- --  1                                             B C A B B E --C -- --  -- --  --1                                             B D B B B F A C B -- --  -- --  1                                             A C --A A C B D A -- --  -- --  1                                             G B B A A B/C A A/D B -- --  -- --  1                                                  0 0 0 0 7                                                           Un- -- --  -- --  -- --  -- --  -- 4 10 2 18 6                                classied:                                                                     Total:           62 15 24 27 98                                             __________________________________________________________________________    (a) The extended haplotype data are derived from the CF families used in      previous linkage studies (see                                                   footnote (a) of Table 3) with additional CF-PS families collected           subsequently (Kerem et al, Am. J. Genet.                                        44:827 (1989)). The data are shown in groups (regions) to reduce space.     The regions are assigned pri-                                                  marily according to pairwise association data shown in Table 4 with          regions 6-8 spanning the putative                                               CF locus (the F508) deletion is between regions 6 and 7). A dash (--)       is shown at the region where the                                                haplotype has not been determined due to incomplete data or inability       to establish phase. Alter-                                                      native haplotype  assignments are also given where date are incomplete.     Unclassified includes those                                                    chromosomes with more  than 3 unknown assignments. The haplotype def         initions for each of the                                                        9 regions are:                                                                   metD              metD                                                                             metH                                                  Region 1-        BanI  TagI  TagI                                           __________________________________________________________________________      A =   1    1    1                                                             B =     2    1    2                                                           C =     1    1    2                                                           D =    2    2    1                                                            E =     1    2    --                                                          F =   2    1    1                                                             G =     2    2    2                                                         __________________________________________________________________________               E6                                                                              E7           pH131                                                                             W3D1.4                                            Region 2-     TagI  TagI       HinfI HindIII                                __________________________________________________________________________      A =    1    2    2    2                                                       B =     2    1    1    1                                                      C =     1    2    1    1                                                      D =   2    1    2    2                                                        E =     2    2    2 1                                                         F =    2    2    1    1                                                       G =    1     2     1     2                                                    H =   1     1     2     2                                                   __________________________________________________________________________          H2.3A                                                                     Region 3-      TagI                                                         __________________________________________________________________________      A =    1                                                                      B =   2                                                                     __________________________________________________________________________         EG1.4             EG1.4                                                                             JG2E1                                                Region 4-     HincII  BglI   PstI                                           __________________________________________________________________________      A =    1     1     2                                                          B =    2     2     1                                                          C =   2     2     2                                                           D =     1     1     1                                                         E =     1     2     1                                                       __________________________________________________________________________         E2.6              E2.8                                                                             E4.1                                                  Region 5-    MspI NcoI MspI                                                 __________________________________________________________________________      A =   2     1     2                                                           B =       1   2     1                                                         C =     2     2     2                                                       __________________________________________________________________________         J44               10-1X.6                                                                          10-1X.6                                               Region 6-      XbpI   AccI  HaeIII                                          __________________________________________________________________________      A =    1     2 1                                                              B =   2     1     2                                                           C =  1     1     2                                                            D =   1     2     2                                                           E =     2     2     2                                                         F =    2     2     1                                                        __________________________________________________________________________         T6/20                                                                      Region 7-  MspI                                                             __________________________________________________________________________      A =    1                                                                      B =    2                                                                    __________________________________________________________________________                    H1.3                                                                             CE1.0                                                        Region 8-         NcoI NdeI                                                 __________________________________________________________________________      A =    2    1                                                                 B =    1    2                                                                 C =    1    1                                                                 D =  2    2                                                                 __________________________________________________________________________          J32              J3.11                                                                            J29                                                   Region 9-      SacI   MspI  PvuII                                           __________________________________________________________________________      A =  1    1    1                                                              B =  2    2    2                                                              C =   2      1   2                                                            D =     2    2    1                                                           E =    2    1    1                                                          __________________________________________________________________________    (b)  Number of chromosomes scared in each class:                                CF-PI(F) = CF chromosomes from CF-PZ patients with the F508 deletion;        CF-PS(F) = CF chromosomes from CF-PS patients with the F508 deletion;         CF-PI = Other CF chromosomes from CF-PI patients;                             CF-PS = Other CF chromosomes from CF-PS patients;                             N = Normal chromosomes derived from carrier parents                      

It was apparent that most recombinations between haplotypes occurredbetween regions 1 and 2 and between regions 8 and 9, again in goodagreement with the relatively long physical distance between theseregions. Other, less frequent, breakpoints were noted between shortdistance intervals and they generally corresponded to the hot spotsidentified by pairwise allelic association studies as shown above. It isof interest to note that the F508 deletion associated almost exclusivelywith Group I, the most frequent CF haplotype, supporting the positionthat this deletion constitutes the major mutation in CF. More important,while the F508 deletion was detected in 89% (62/70) of the CFchromosomes with the AA haplotype (corresponding to the two regions, 6and 7) flanking the deletion, it was not was found in the 14 Nchromosomes within the same group (_(x) ² =47.3, p<10⁻⁴). The F508deletion was therefore not a sequence polymorphism associated with thecore of the Group I haplotype (see Table 5).

Together, the results of the oligonucleotide hybridization study and thehaplotype analysis support the fact that the gene locus described hereis the CF gene and that the 3 bp (F508) deletion is the most commonmutation in CF.

3.6 Intron/Exon Boundaries

The entire genomic CF gene includes all of the regulatory geneticinformation as well as intron genetic information which is spliced outin the expression of the CF gene. Portions of the introns at theintron/exon boundaries for the exons of the CF gene are very helpful inlocating mutations in the CF gene, as they permit PCR analysis fromgenomic DNA. Genomic DNA can be obtained from any tissue includingleukocytes from blood. Such intron information can be employed in PCRanalysis for purposes of CF screening which will be discussed in moredetail in a later section. As set out in FIG. 18 with the headings "Exon1 through Exon 24", there are portions of the bounding introns inparticular those that flank the exons which are essential for PCR exonamplification.

Further assistance in interpreting the information of FIG. 18 isprovided in FIG. 21. Genomic DNA clones containing the coding region ofthe CFTR gene are provided. As is apparent from FIG. 21, there areconsiderable gaps between the clones of the exons which indicates thegaps in the intron portions between the exons of FIG. 18. These gaps inthe intron portions are indicated by " . . . ". In FIG. 21, the cloneswere mapped using different restriction endonucleases (AccI,A; AvaI,W;BamHI,B; BgIII,G; BssHI,Y; EcoRV,V; FspI,F; HincII,C; HindIII,H; Kpn,K;NcoI,J; PstI,P; PvuII,U; SmaI,M; SacI,S; SspI,E; StyI,T; XbaI,X;XhoI,O). In FIG. 21, the exons are represented by boxed regions. Theopen boxes indicate non-coding portions of the exons, whereas closedboxes indicate coding portions. The probable positions of the exonswithin the genomic DNA are also indicated by their relevative positions.The arrows above the boxes mark the location of the oligonucleotidesused as sequencing primers in the PCR amplification of the genomic DNA.The numbers provided beneath the restriction map represent the size ofthe restriction fragments in kb.

In sequencing the intron portions, it has been determined that there areat least 27 exons instead of the previously reported 24 exons inapplicants' aforementioned co-pending applications. Exons 6, 14 and 17,as previously reported, are found to be in segments and are now namedexons 6a, 6b, exons 14a, 14b and exons 17a, 17b.

The intron portions, which have been used in PCR amplification, areidentified in the following Table 5 and underlined in FIG. 18. Theportions identified by the arrows are selected, but it is understoodthat other portions of the intron sequences are also useful in the PCRamplification technique. For example, for exon 10 the relevant geneticinformation which is preferred in PCR is noted by reference to the 5'and 3' ends of the sequence. The intron section is identified with an"i". Hence in Table 5 for exon 2, the preferred portions are identifiedby 2i-5 and 2i-3 and similarly for exons 3 through 24. For exon 1, theselected portions include the sequence GGA . . . AAA for B115-B and ACA. . . GTG for 10D. For exon 13, portions are identified by two sets:13i-5 and C1-1m and X13B-5 and 13i-3A. (This exon (13) is large and mostpractical to be completed in two sections). C1-1M and X13B-5 are fromexon sequences. The specific conditions for PCR amplification ofindividual exons are summarized in the following Table 6 and arediscussed in more detail hereinafter with respect to the procedureexplained in R. K. Saiki et al, Science 230:1350 (1985).

These oligonucleotides, as derived from the intron sequence, assist inamplifying by PCR the respective exon, thereby providing for analysisfor DNA sequence alterations corresponding to mutations of the CF gene.The mutations can be revealed by either direct sequence determination ofthe PCR products or sequencing the products cloned in plasmid vectors.The amplified exon can also be analyzed by use of gel electrophoresis inthe manner to be further described. It has been found that the sectionsof the intron for each respective exon are of sufficient length to workparticularly well with PCR technique to provide for amplification of therelevant exon.

                  TABLE 5                                                         ______________________________________                                        Oligonucleotides used for amplification of CF gene exons by PCR                                                  Amplified                                                                         product                                  Exon   PCR primers; 5' → 3' (bp)                                     ______________________________________                                          1  GGAGTTCACTCACCTAAA (B115-B)                                                                             933                                                ACACGCCCTCCTCTTTCGTC (10D)                                                    2    CCAAATCTGTATGGAGACCA (2i-5)     378                                      TATGTTGCCCAGGCTGGTAT (2i-3)                                                   3    CTTGGGTTAATCTCCTTGGA (3i-5)             309                              ATTCACCAGATTTCGTAGTC (3i-3)                                                   4    TCACATATGGTATGACCCTC (4i-5)         438                                  TTGTACCAGCTCACTACCTA (4i-3)                                                   5    ATTTCTGCCTAGATGCTGGG (5i-5)         395                                  AACTCCGCCTTTCCAGTTGT (5i-3)                                                   6a   TTAGTGTGCTCAGAACCACG (6Ai-5)         385                                 CTATGCATAGAGCAGTCCTG (6Ai-3)                                                  6b   TGGAATGAGTCTGTACAGCG (6Ci-5)          417                                GAGGTGGAAGTCTACCATGA (6Ci-3)                                                  7    AGACCATGCTCAGATCTTCCAT (7i-5)     410                                    GCAAAGTTCATTAGAACTGATC (7i-3)                                                 8    TGAATCCTAGTGCTTGGCAA (8i-5)     359                                      TCGCCATTAGGATGAAATCC (8i-3)                                                   9    TAATGGATCATGGGCCATGT (9i-5)       560                                    ACAGTGTTGAATGTGGTGCA (9i-3)                                                 10   GCAGAGTACCTGAAACAGGA (10i-5)          491                                  CATTCACAGTAGCTTACCCA (10i-3)                                                11   CAACTGTGGTTAAAGCAATAGTGT (11i-5)                 425                       GCACAGATTCTGAGTAACCATAAT (11i-3)                                            12   GTGAATCGATGTGGTGACCA (12i-5)              426                              CTGGTTTAGCATGAGGCGGT (12i-3)                                                13 (a) TGCTAAAATACGAGACATATTGCA (13i-5)          528                            ATCTGGTACTAAGGACAG (C1-1M)                                                  (b) TCAATCCAATCAACTCTATACGAA (X13B-5)       497                                 TACACCTTATCCTAATCCTATGAT (13i-3A)                                           14a  AAAAGGTATGCCACTGTTAAG (14Ai-5)                 511                         GTATACATCCCCAAACTATCT (14Ai-3)                                              14b  GAACACCTAGTACAGCTGCT (14Bi-5)               449                            AACTCCTGGGCTCAAGTGAT (14Bi-3)                                               15   GTGCATGCTCTTCTAATGCA (15i-5)           485                                 AAGGCACATGCCTCTGTGCA (15i-3)                                                16   CAGAGAAATTGGTCGTTACT (16i-5)        570                                    ATCTAAATGTGGGATTGCCT (16i-3)                                                17a  CAATGTGCACATGTACCCTA (17Ai-5)               579                           TGTACACCAACTGTGGTAAG (17Ai-3)          463                                   17b  TTCAAAGAATGGCACCAGTGT (17Bi-5)                                             ATAACCTATAGAATGCAGCA (17Bi-3)                451                            18   GTAGATGCTGTGATGAACTG (18i-5)                                               AGTGGCTATCTATGAGAAGG (18i-3)                454                             19 GCCCGACAAATAACCAAGTGA (19i-5)                                                GCTAACACATTGCTTCAGGCT (19i-3)                473                            20 GGTCAGGATTGAAAGTGTGCA (20i-5)                                                CTATGAGAAAACTGCACTGGA (20i-3)                   477                         21 AATGTTCACAAGGGACTCCA (21i-5)                                                 CAAAAGTACCTGTTGCTCCA (21i-3)             562                                22 AAACGCTGAGCCTCACAAGC (22i-5)                                                 TGTCACCATGAAGCAGGCAT (22i-3)         400                                    23 AGCTGATTGTGCGTAACGCT (23i-5)                                                 TAAAGCTGGATGGCTGTATG (23i-3)             569                                24 GGACACAGCAGTTAAATGTG (24i-5)                                                 ACTATTGCCAGGAAGCCATT (24i-3)                                              ______________________________________                                    

                                                                  TABLE       

    __________________________________________________________________________                                   6                                                         Thermal cycle                                                                 Initial                  Final                                         denaturation Denaturation Annealing Extension extension                     Exon Buffer.sup.a time/temp time/temp time/temp time/temp time/temp         __________________________________________________________________________          A(1.5)    6b                                                                       6 min/94 C.                                                                         30 sec/94 C.                                                                        30 sec/55 C.                                                                        1 min/72 C.                                                                          7 min/72 C.                                 7-10, 12,                                                                     14a, 16, 17b,                                                                 18-24                                                                         1            B    6 min/94 C. 30 sec/94 C. 30 sec/55 C. 2.5 min/72 C. 7                                         min/72 C.                                   2, 11         B   6 min/94 C. 30 sec/94 C. 30 sec/52 C. 1 min/72 C. 7                                           min/72 C.                                   13a        A(1.75) 6 min/94 C. 30 sec/94 C. 30 sec/54 C. 2.5 min/72 C.                                          7 min/72 C.                                 13b        A(1.75) 6 min/94 C. 30 sec/94 C. 30 sec/52 C. 2.5 min/72 C.                                          7 min/72 C.                                 14b          B    6 min/94 C. 30 sec/94 C. 30 sec/56 C. 1 min/72 C. 7                                           min/72 C.                                   17a A(1.5) 6 min/94 C. 30 sec/94 C. 30 sec/56 C. 1 min/72 C. 7 min/72       __________________________________________________________________________                                        C.                                         (a) Buffer A(1.5): * buffer with 1.5 mM MgCl.sub.2                            Buffer A(1.75): * buffer with 1.75 mM MgCl.sub.2                              Buffer B: 67 mM TrisHCl pH 8.8, 6.7 mM MgCl.sub.2, 16.6 mM                    (NH.sub.4).sub.2 SO.sub.4, 0.67 uM EDTA, 10 mM Bmercaptoethanol, 170 ug/m     BSA, 10% DMSO, 1.5 mM of each dNTP's                                          * Buffer A contains:                                                          10 mM Tris pH 8.3 (@ 25° C.)                                           50 mM KCl                                                                     0.001% (w/w) gelatin                                                          0.2 mM dNTPs.                                                                 dNTPs = deoxynucleotide triphosphates                                    

3.7 CF Mutations--ΔI506 or ΔI507

The association of the F508 deletion with 1 common and 1 rare CFhaplotype provided further insight into the number of mutational eventsthat could contribute to the present patient population. Based on theextensive haplotype data, the original chromosome in which the F508deletion occurred is likely to carry the haplotype--AAAAAAA-- (GroupIa), as defined in Table 4. The other Group I CF chromosomes carryingthe deletion are probably recombination products derived from theoriginal chromosome. If the CF chromosomes in each haplotype group areconsidered to be derived from the same origin, only 3-4 additionalmutational events would be predicted (see Table 4). However, since manyof the CF chromosomes in the same group are markedly different from eachother, further subdivision within each group is possible. As a result, ahigher number of independent mutational events could be considered andthe data suggest that at least 7 additional, putative mutations alsocontribute to the CF-PI phenotype (see Table 3). The mutations leadingto the CF-PS subgroup are probably more heterogeneous.

The 7 additional CF-PI mutations are represented by the haplotypes:--CAAAAAA-- (Group Ib), --CABCAAD-- (Group Ic), ------BBBAC-- (GroupIIa), --CABBBAB-- (Group Va). Although the molecular defect in each ofthese mutations has yet to be defined, it is clear that none of thesemutations severely affect the region corresponding to theoligonucleotide binding sites used in the PCR/hybridization experiment.

One CF chromosome hydridizing to the ΔF508-ASO probe, however, has beenfound to associate with a different haplotype (group IIIa). It appearedthat the ΔF508 should have occurred in both haplotypes, but with thediscovery of ΔI507, it is discovered that it is not. Instead, the ΔF508is in group Ia, whereas the ΔI507 is in group IIIa. None of the other CFnor the normal chromosomes of this haplotype group (IIIa) have shownhybridization to the mutant (ΔF508) ASO [B. Kerem et al, Science245:1073 (1989)]. In view of the group Ia and IIIa haplotypes beingdistinctly different from each other, the mutations harbored by thesetwo groups of CF chromosomes must have originated independently. Toinvestigate the molecular nature of the mutation in this group IIIa CFchromosome, we further characterized the region of interest throughamplification of the genomic DNA from an individual carrying thechromosome IIIa by the polymerase chain reaction (PCR).

These polymerase chains reactions (PCR) were performed according to theprocedure of R. K. Saiki et al Science 230:1350 (1985). A specific DNAsegment of 491 bp including exon 10 of the CF gene was amplified withthe use of the oligonucleotide primers 10i-5(5'-GCAGAGTACCTGAAACAGGA-3') SEQ ID NO: 10 and 10i-3(5'CATTCACAGTAGCTTACCCA-3') SEQ ID NO: 11 located in the 5' and 3'flanking regions, respectively, as shown in FIG. 18 and itemzied inTable 5. Both oligonucleotides were purchased from the HSC DNABiotechnology Service Center (Toronto). Approximately 500 ng of genomicDNA from cultured lymphoblastoid cell lines of the parents and the CFchild of Family 5 were used in each reaction. The DNA samples weredenatured at 94° C. for 30 sec., primers annealed at 55° C. for 30 sec.,and extended at 72° C. for 50 sec. (with 0.5 unit of Taq polymerase,Perkin-Elmer/Cetus, Norwalk, Conn.) for 30 cycles and a final extensionperiod of 7 min. in a Perkin-Elmer/Cetus DNA Thermal Cycler. Reactionconditions for PCR amplification of other exons are set out in Table 6.

Hydridization analysis of the PCR products from three individuals ofFamily 5 of group IIIa was performed. The carrier mother and father arerepresented by a half-filled circle and square, respectively, and theaffected son is a filled square in FIG. 19a. The conditions forhybridizaton and washing have been previously described (Kerem et al,supra). There is a relatively weak signal in the father's PCR productwith the mutant (oligo ΔF508) probe. In FIG. 19b, DNA sequence analysisof the clone 5-3-15 and the PCR products from the affected son and thecarrier father are shown. The arrow in the center panel indicates thepresence of both A and T nucleotide residue in the same position; thearrow in the right panel indicates the points of divergence between thenormal and the ΔI507 sequence. The sequence ladders shown are derivedfrom the reverse-complements as will be described later. FIG. 19c showsthe DNA sequences and their corresponding amino acid sequences of thenormal, ΔI507, and ΔF508 alleles spanning the mutation sites are shown.With reference to FIG. 19a, the PCR-amplified DNA from the carrierfather, who contributed the group IIIa CF chromosome to the affectedson, hybridized less efficiently with the ΔF508 ASO than that from themother who carried the group Ia CF chromosome. The difference becameapparent when the hybridization signals were compared to that with thenormal ASO probe. This result therefore indicated that the mutationcarried by the group IIIa CF chromosome might not be identical to ΔF508.

To define the nucleotide sequence corresponding to the mutant allele onthis chromosome, the PCR-amplified product of the father's DNA wasexcised from a polyacrylamide-electrophoretic gal and cloned into asequencing vector.

The general procedures for DNA isolation and purification for purposesof cloning into a sequencing vector are described in J. Sambrook, E. F.Fritsch, T. Maniatis, Molecular Cloning: A Laboratory Manual, 2nd ed.(Cold Spring Harbor Press, N.Y. 1989). The two homoduplexes generated byPCR amplification of the paternal DNA were purified from a 5%non-denaturing polyacrylamide gel (30:1 acrylamide:bis-acrylamide). Theappropriate bands were visualized by staining with ethidium bromide,excised and eluted in TE (10 mM Tris-HCl; 1 mM EDTA; pH 7.5) for 2 to 12hours at room temperature. The DNA solution was sequentially treatedwith Tris-equilibrated phenol, phenol/CHCl₃ and CHCl₃. The DNA sampleswere concentrated by precipitation in ethanol and resuspension in TE,incubated with T4 polynucleotide kinase in the presence of ATP, andligated into diphosphorylated, blunt-ended Bluescript KS™ vector(Stratagene, San Diego, Calif.). Clones containing amplified productgenerated from the normal parental chromosome were identified byhybridization with the oligonucleotide N as described in Kerem et alsupra.

Clones containing the mutant sequence were identified by their failureto hybridize to the normal ASO (Kerem at al, supra). One clone, 5-3-15was isolated and its DNA sequence determined. The general protocol forsequencing cloned DNA in essentially as described [J. R. Riordan at al,Science 245:1066 (1989)] with the use of an U.S. Biochemicals Sequenase™kit. To verify the sequence and to exclude any errors introduced by DNApolymerase during PCR, the DNA sequences for the PCR products from thefather and one of the affected children were also determined directlywithout cloning.

This procedure was accomplished by denaturing 2 pmoles of gel-purifieddouble-stranded PCR product in 0.2 M NaOH/0.2 mM EDTA (5 min. at roomtemperature), neutralized by adding 0.1 volume of 2 M ammonium acetate(pH 5.4) and precipitated with 2.5 volumes of ethanol at -70° C. for 10min. After washing with 70% ethanol, the DNA pellet was dried andredissolved in a sequencing reaction buffer containing 4 pmoles of theoligonucleotide primer 10i-3 of FIG. 18, dithiothreitol (8.3 mM) and[α-35S]-dATP (0.8 μM, 1000 Ci/mmole). The mixture was incubated at 37°C. for 20 min., following which 2 μl of labelling mix, as included inthe Sequenase" Kit and then 2 units of Sequenase enzyme were added.Aliquotes of the reaction mixture (3.5 μl) were transferred, withoutdelay, to tubes each containing 2.5 μl of ddGTP, ddATP, ddTTP and ddCTPsolutions (U.S. Biochemicals Sequenase kit) and the reactions werestopped by addition of the stop solution.

The DNA sequence for this mutant allele is shown in FIG. 19b. The dataderived from the cloned DNA and direct sequencing of the PCR products ofthe affected child and the father are all consistent with a 3 bpdeletion when compared to the normal sequence (FIG. 19c). The deletionof this 3 bp (ATC) at the I506 or I507 position results in the loss ofan isoleucine residue from the putative CFTR, within the sameATP-binding domain where ΔF508 resides, but it is not evident whetherthis deleted amino acid corresponds to the position 506 or 507. Sincethe 506 and 507 positions are repeats, it is at present impossible todetermine in which position the 3 bp deletion occurs. For convenience inlater discussions, however, we refer to this deletion as ΔI507.

The fact that the ΔI507 and ΔF508 mutations occur in the same region ofthe presumptive ATP-binding domain of CFTR is surprising. Although theentire sequence of ΔI507 allele has not been examined, as has been donefor ΔF508, the strategic location of the deletion argues that it is theresponsible mutation for this allele. This argument is further supportedby the observation that this alteration was not detected in any of thenormal chromosomes studied to date (Kerem et al, supra). Theidentification of a second single amino acid deletion in the ATP-bindingdomain of CFTR also provides information about the structure andfunction of this protein. Since deletion of either the phenylalanineresidue at position 508 or isoleucine at position ΔI507 is sufficient toaffect the function of CFTR such that it causes CF disease, it issuggested that these residues are involved in the folding of the proteinbut not directly in the binding of ATP. That is, the length of thepeptide is probably more important than the actual amino acid residuesin this region. In support of this hypothesis, it has been found thatthe phenylalanine residue can be replaced by a serine and thatisoleucine at position 506 with valine, without apparent loss offunction of CFTR.

When the nucleotide sequence of ΔI507 is compared to that of ΔF508 atthe ASO-hybridizing region, it was noted that the difference between thetwo alleles was only an A→T change (FIG. 19c). This subtle differencethus explained the cross-hybridization of the ΔF508-ASO to ΔI507. Theseresults therefore exemplified the importance of careful examination ofboth parental chromosomes in performing ASO-based genetic diagnosis. Ithas been determined that the ΔF508 and ΔI507 mutations can bedistinguished by increasing the stringency of oligonucleotidehybridization condition or by detecting the unique mobility of theheteroduplexes formed between each of these sequences and the normal DNAon a polyacrylamide gel. The stringency of hybridization can beincreased by using a washing temperature at 45° C. instead of the prior39° C. in the presence of 2×SSC (1×SSC=150 mM NaCL and 15 mM Nacitrate).

Identification of the ΔI507 and ΔF508 alleles by polyacrylamide gelelectrophoresis is shown in FIG. 20. The PCR products were prepared fromthe three family members and separated on a 5% polyacrylamide gel asdescribed above. A DNA sample from a known heterozygous ΔF508 carrier isincluded for comparison. With reference to FIG. 20, the banding patternof the PCR-amplified genomic DNA from the father, who is the carrier ofΔI507, is clearly distinguishable from that of the mother, who is of thetype of carriers with the ΔF508 mutation. In this gel electrophoresistest, there were actually three individuals (the carrier father and thetwo affected sons in Family 5) who carried the ΔI507 deletion. Sincethey all belong to the same family, they only represent one single CFchromosome in our population analysis [Kerem et al, supra]. The twopatients who also inherited the ΔF508 mutation from their mother showedtypical symptoms of CF with pancreatic insufficiency. The father of thisfamily was the only parent who carries this ΔI507 mutation; no other CFparents showed reduced hybridization intensity signal with the ΔF508mutant oligonucleotide probe or a peculiar heteroduplex pattern for thePCR product (as defined above) in the retrospective study. In addition,two representatives of the group IIIb and one of the group IIIc CFchromosomes from our collection [Kerem et al, supra] were sequenced, butnone were found to contain ΔI507. Since the electrophoresis techniqueeliminates the need for probe-labelling and hybridization, it may proveto be the method of choice for detecting carriers in a large populationscale [J. M. Rommens et al, Am. J. Hum. Genet. 46:395-396 (1990)].

The present data also indicate that there Is a strict correlationbetween DNA marker haplotype and mutation in CF. The ΔF508 deletion isthe most common CF mutation that occurred on a group Ia chromosomebackground [Kerem at al, supra]. The ΔI507 mutation is, however, rare inthe CF population; the one group-IIIa CF chromosome carrying thisdeletion is the only example in our studied population (1/219). Sincethe group III haplotype is relatively common among the normalchromosomes (17/198), the ΔI507 deletion probably occurred recently.Additional studies with larger populations of different geographic andethnic backgrounds should provide further insight in understanding theorigins of these mutations.

3.8 Additional CF Mutations

Following the above procedures, other mutations in the CF gene have beenidentified. The following brief description of each identified mutationis based on the previously described procedures for locating themutation involving use of PCR procedures. The mutations are given shortform names. The numbering used in these abbreviations refers to eitherthe DNA sequence or the amino acid sequence position of the mutationdepending on the type of mutation. For example, splice mutations andframeshift mutations are defined using the DNA sequence position. Mostother mutations derive their nomenclature from the amino acid residueposition. The description of each mutation clarifies the nomenclature inany event.

For example, mutations G542X, Q493X, 3659 del C, 556 del A result inshortened polypeptides significantly different from the single aminoacid deletions or alteration. G542X and Q493X involve a polypeptideincluding on the first 541 and 493 amino acid residues, respectively, ofthe normal 1480 amino acid polypeptide. 3659 del C and 556 del A alsoinvolve shortened versions and will include additional amino acidresidues. Mutation 711+1G→T and 1717-1G→A are predicted to lead topolypeptides which cannot be as of yet exactly defined. They probably dolead to shortened polypeptides but could contain additional amino acids.DNA sequences encoding these mutant polypeptides will now contain intronsequence from the normal gene or possibly deleted exons.

3.8.0 Mutations in Exon 1

In the 129G→C mutation, there is a single basepair change of G to C atnucleotide 129 of the cDNA sequence of FIG. 1. The PCR product foramplifying genomic DNA containing this mutation is derived from theB115-B and 10D primers as set out in Table 5. The genomic DNA isamplified as per the conditions of Table 6.

3.8.1 Mutations in Exon 3

The G85E Mutation in exon 3 involves a G to A transition at nucleotideposition 386. It is detected in family #26, a French Canadian familyclassified as PI. This predicted Gly to Glu amino acid change isassociated with a group IIb haplotype. The mutation destroys a Hinflsite. The PCR product derived from the 3i-5 and 3i-3 primers, as perconditions of Table 6, is cleaved by this enzyme into 3 fragment, 172,105 and 32 bp, respectively, for the normal sequence; a fragment of 277bp would be present for the mutant sequence. We analyzed 54 CFchromosomes, 8 from group II, and 50 normal chromosomes, 44 from groupII, and did not find another example of G85E.

3.8.2 Mutations in Exon 4

556 del A is a frameshift mutation in exon 4 in a single CF chromosome(Toronto family #17, GM1076). There is a deletion of A at nucleotideposition 556. This mutation is associated with Group IIIb haplotype andis not found in 31 other CF chromosomes (9 from IIIb) and 30 Nchromosomes (16 from IIIb). The muation creates a BglI 1 enzyme cleavagesite. The PCR primers are 4i-5 and 4i-3 (see Table 5) where the enzymecuts the mutant PCR product (437 bp) into 2 fragments of 287 and 150 bpin size.

The I148T mutation in exon 4 involves a T to C basepair transition atnucleotide position 575. This results in an Ile to Thr change at aminoacid position 148 of FIG. 1. The PCR product used in amplifying genomicDNA containing this mutation uses primers 4i-5 and 4i-3 as set out inTable 5. The reaction conditions for amplifying the genomic DNA are setout in Table 6.

3.8.3 Mutations in Exon 5

In mutation G178R the Gly to Arg missense mutation in exon 5 is due to aG to A change at nucleotide position 664. The mutation is found on themother's CF chromosome in family #50; the other mutation in this familyis ΔF508. Primers 5i-5 and 5i-3 were used for amplifying genomic DNA asoutlined in Tables 5 and 6.

3.8.4 Mutations in Exon 9

A mutation in exon 9 is a change of alanine (GCG) to glutamic acid (GAG)at amino acid position 455 (A455→E). Two of the 38 non-ΔF508 CFchromosomes examined carries this mutation; both of them are frompatients of a French-Canadian origin, which we have identified in ourwork as families #27 and #53, and they belong to haplotype group Ib. Themutation is detectable by allele-specific oligonucleotide (ASO)hybridization with PCR-amplified genomic DNA sequence. The PCR primersare 91-5 (5'-TAATGGATCATGGGCCATGT-3') SEQ ID NO: 12 and 9i-3(5'-ACAGTGTTGAATGTGGTGCA-3') SEQ ID NO: 13 for amplifying genomic DNAunder the conditions of Table 6. The ASOs are 5'-GTTGTTGGCGGTTGCT-3' SEQID NO: 14 for the normal allele and 5'-GTTGTTGGAGGTTGCT-3' SEQ ID NO: 15for the mutant. The oligonucleotide hybridization is as described inKerem et al (1989) supra at 37° C. and the washings are done twice with5×SSC for 10 min each at room temperature followed by twice with 2×SSCfor 30 min each at 52° C. Although the alanine at position 455 (Ala455)is not present in all ATP-binding folds across species, it is present inall known members of the P-glycoprotein family, the protein most similarto CFTR. Further, A455→E is believed to be a mutation rather than asequence polymorphism because the change is not found in 16 non-ΔF508 CFchromosomes and three normal chromosomes carrying the same group Ihaplotype.

3.8.5 Mutations in Exon 10

In the Q493X mutation Gln493 (CAG) is changed into a stop codon (TAG) inToronto family #9 (nucleotide position 1609 C→T). The muation occurs ona CF chromosome with haplotype IIIb; it is not found in 28 normalchromosomes (15 of which belong to 11b) nor in 33 other CF chromosomes(5 of which IIIb). The mutation can be detected by allele-specific PCR,with 10i-5 as the common PCR primer, 5'-GGCATAATCCAGGAAAACTG-3' SEQ IDNO: 16 for the normal sequence and 5'-GGCATAATCCAGGAAAACTA-3' SEQ ID NO:17 for the mutant allele. The PCR condition is 6 min at 94° followed bycycles of 30 sec at 94°, 30 sec at 57° and 90 sec at 72°, with 100 ng ofeach primer and -400 ng genomic DNA. The primers 9i-3 and 9i-5 may beused for internal PCR control as they share the same reaction condition.

3.8.6 Mutations in Exon 11

In mutation G542X the glycine codon (GGA) at amino acid position 542 ischanged to a stop codon (TGA) (G542→Stop). The single chromosomecarrying this mutation is of Ashkenazic Jewish origin (family A) and hasthe B haplotype (XV2C allele 1; KM.19 allele 2). The mutant sequence canbe detected by hybridization analysis with allele-specificoligonucleotides (ASOs) on genomic DNA amplified under conditions ofTable 6 by PCR with the 11i-5 and 11i-3 oligonucleotide primers. Thenormal ASO is 5'-ACCTTCTCCAAGAACT-3' SEQ ID NO: 18 and the mutant ASO,5'-ACCTTCTCAAAGAACT-3' SEQ ID NO: 19. The oligonucleotide hybridizationcondition is as described in Kerem et al (1989) supra and the washingconditions are twice in 5×SSC for 10 min. each at room temperaturefollowed by twice in 2×SSC for 30 min. each at 45° C. The mutation isnot detected in 52 other non-ΔF508 CF chromosomes, 11 of which are ofJewish origin (three have a B haplotype), nor in 13 normal chromosomes.

In mutation S549R, the highly conserved serine residue of the nucleotidebinding domain at position 549 is changed to arginine (S549→R); thecodon change is AGT→AGG. The CF chromosome with this mutation is carriedby a non-Ashkenazic Jewish pateitn from Morocco (family B). Thechromosome also has the B haplotype. Detection of this mutation may beachieved by ASO hybridization or allele-specific PCR. In the ASOhybridization procedure, the genomic DNA sequence is first amplifiedunder conditions of Table 6 by PCR with the 11i-5 and 11i-3oligonucleotides; the ASO for the normal sequence is5'-ACACTGAGTGGAGGTC-3' SEQ ID NO: 20 and that for the mutant is5'-ACACTGAGGGGAGGTC SEQ ID NO: 21. The oligonucleotide hybridizationcondition is as described by Kerem et al (1989) supra and the washingsare done twice in 5×SSC for 10 min. each at room temperature followed bytwice in 2×SSC for 30 min. eachat 56° C. In the allele-specific PCRamplification, the oligonucleotide primer for the normal sequence is5'TGCTCGTTGACCTCCA-3' SEQ ID NO: 22, that for the mutant is5'TGCTCGTTGACCTCCC-3' SEQ ID NO: 23 and that for the common, outsidesequence is 11i-5. The reaction is performed with 500 ng of genomic DNA,100 ng of each of the oligonucleotides and 0.5 unit of Taq polymerase.The DNA template is first denatured by heating at 94° C. for 6 min.,followed by 30 cycles of 94° for 30 sec, 55° for 30 sec and 72° for 60sec. The reaction is completed by a 6 min heating at 72° for 7 min. ThisS549→R mutation is not present in 52 other non-ΔF508 CF chromosomes, 11of which are of Jewish origin (three have a B haplotype), nor in 13normal chromosomes.

In the S549I mutation there is an AGT→ATT change (nucleotide position1778 G→T) which represent the third mutation involving this amino acidcodon resulting in a loss of the DdeI site. We have only one example whois of Arabic origin and is sequenced; no other Ddel-resistant chromosomeis found in 5 other Arabic CF, 21 Jewish CF, 41 Canadian CF, and 13Canadian normal chromosomes.

In mutation R560T the arginine (AAG) at amino acid position 560 ischanged to threonine (AAC). The individual carrying this mutation(R560→T) is from a family we have identified in our work as family #32and the chromosome is marked by haplotype IIIb. The mutation creates aMaeII site which cleaves the PCR product of exon 11 (generated withprimers 11i-5 and 11i-3 under conditions of Table 6) into two fragmentsof 214 and 204 bp in size. None of the 36 non-ΔF508 CF chromosomes(seven of which have haplotype IIIb) or 23 normal chromosomes (16 havehaplotype IIIb) carried this sequence alteration. The R560→T mutation isalso not present on eight CF chromosomes with the ΔF508 mutation.

In mutation G551D glycine (G) at amino acid position 551 is changed toaspartic acid (D). G551 is a highly conserved residue within theATP-binding fold. The corresponding codon change is from GGT to GAT. TheG551→D change is found in 2 of our families (#1, #38) with pancreaticinsufficient (PI) CF patients and 1 family (#54) with a pancreaticsufficient (PS) patient. The other CF chromosomes in family #1 and #38carry the ΔF508 mutation and that in family #54 is unknown. Based on our"severe and mild mutation" hypothesis (Kerem et al. 1989), this mutationis expected to be a "severe" one. All 3 chromosomes carrying thismutation belong to Group IIIb. This G551→D substitution does notrepresent a sequence polymorphism because the change is not detected in35 other CF chromosomes without the ΔF508 deletion (5 of them from groupIIIb) and 19 normal chromosomes (including 5 from group IIIb). To detectthis mutation, the genomic DNA region may be amplified under conditionsof Table 6 by PCR with primers 11i-5 (5'-CAACTGTGGTTAAAGCAATAGTGT-3')SEQ ID NO: 24 and 11i-3 (5'-GCACAGATTCTGAGTAACCATAAT-3') SEQ ID NO: 25and examined for the presence of a MboI (Sau3A) site created bynucleotide change; the uncut (normal) form is 419 bp in length and thedigestion products (from the mutant form) are 241 and 178 bp.

3.8.7 Mutations in Exon 12

In the Y563N mutation a T to A change is detected at nucleotide position1820 in exon 12. This switch would result in a change from Tyr to Asn atamino acid position 563. It is found in a single family with 2 PSpatients but the mutation in the other chromosome is unknown. We thinkY563N is probably a missense mutation because (1) the T to A change isnot found in 59 other CF chromosomes, with 8 having the same haplotype(IIa) and 30 having ΔF508; and (2) this alteration is not found in 54normal chromosomes, with 39 having the 11a haploytype. Unfortunately,the amino acid change is not drastic enough to permit a strong argument.This putative mutation can be detected by ASO hybridization with anormal (5'-AGCAGTATACAAAGATGC-3') and a mutant(5'-AGCAGTAAACAAAGATGC-3') oligonucleotide probe. The washing conditionis 54° C. with 2×SSC.

In the P574H mutation the C at nucleotide position 1853 is changed to A.Although the amino acid Pro at this position is not highly conservedacross different ATP-binding folds, c change to His could be a drasticsubstitution. This change is not detected in 52 other CF chromosomes nor15 normal chromosomes, 4 of which have the same group IV haplotype.Based on these arguments, we believe P574H is a mutation. To detect thisputative mutation, one may use the following ASOs: 5'-GACTCTCCTTTTGGA-3'SEQ ID NO: 26 for the normal and 5'-GACTCTCATTTTGGA-3' SEQ ID NO: 27 forthe mutant. Washing should be done at 47° in 2×SSC.

In the L1077P mutation, the T at nucleotide position 3362 is changed toC. This results in a change of the amino acid Leu to Pro at aminoposition 1077 in FIG. 1. As with the other mutations in this exon, thegenomic DNA is amplified by use of the primers of Table 5; namely 17bi-5and 17bi-3. The reaction conditions in amplifying the genomic DNA arenet out in Table 6.

The Y1092X mutation involves a change of C at nucleotide position 3408to A. This would result in protein synthesis termination at aminoposition 1092. Hence the amino acid Tyr is not present in the truncatedpolypeptide. As with the above procedures, the primers used inamplifying this mutation are 17bi-3 and 17bi-3.

3.8.8 Mutations in Exon 19

3659 del C is a frameshift mutation in exon 19 in a single CF chromosome(Toronto family #2); deletion of C at nucleotide position 3659 or 3960;haplotype IIa; not present in 57 non-ΔF508 CF chromosomes (7 from IIa)and 50 N chromosomes (43 from IIa); the deletion may be detected by PCRwith a common oligonucleotide primer 19i-5 (see Table 5) and 2 ASOprimers, HSC8 (5'-GTATGGTTTGGTTGACTT GG-3') SEQ ID NO: 28 for the normaland HSC9 (5'-GTATGGTTTGGTTGACTTGT-3') SEQ ID NO: 29 for the mutantallele; the PCR condition is as usual except the annealing temperatureis at 60° C. to improve specificity.

3.8.9 Mutations in Intron 4

In the 621+1G→T mutation there is a single bp change affecting thesplice site (GT→TT) at the 3' end of exon 4; this mutation is detectedin 5 French-Canadian CF chromosomes (one each in Toronto families #22,23, 26, 36 and 53) but not in 33 other CF chromosomes (18 from the samegroup, group I) and 29 N chromosomes (13 from group I); the mutationcreates a MseI site; genomic DNA may be amplified by the 2 intronprimers, 4i-5 adn 4i-3, and cut with Msel to distinguish the normal andmutant alleles; the normal would give 4 fragments of 33, 35, 71 and 298bp in size; the 298 bp fragment in the mutant is cleaved by the enzymeto give a 54 and 244 bp fragments.

3.8.10 Mutations in Intron 5

In the 711+1G→T mutation this G to T switch occurs at the splicejunction after exon 5. The mutation is found on the mother's CFchromosome in family #22, a French Canadian family from Chicoutimi. Theother mutation in this family is 621+1G→T.

3.8.11 Mutations in Intron 10

In the 1717-1G→A mutation a putative splice mutation is found in frontof exon 11. This mutation is located at the last nucleotide of theintron before exon 11. The mutation may be detected with the followingASO's: normal=5'-TTTGGTAATAGGACATCTCC-3' SEQ ID NO: 30; mutantASO=5'-TTTGGTAATAAGACATCTCC-3' SEQ ID NO: 31. The washing conditionsafer hybridization are 5×SSC twice for 10 min at room temp, 2×SSC twicefor 30 min at 47° for the mutant and 2×SSC twice to 30 min at 48° forthe normal ASO. We have only 1 single example from an Arabic patient andthere is no haploytpe data. The mutation is not found in 5 other Arabic,21 Jewish, and 41 Canadian CF chromosomes, nor in 13 normal chromosomes.

3.9 DNA Sequence Polymorphisms

    ______________________________________                                               Nucleotide position                                                                        Amino acid change                                         ______________________________________                                        1540 (A or G)       Met or Val                                                  1716 (G ar A)          na change (Glu)                                        2694 (T ar G)          no change (Thr)                                         356 (G or A)                 Arg or Gln                                    ______________________________________                                    

A polymorphism is detected at nucleotide position 1540- the A residuecan be substituted by G, changing the corresponding amino acid from Metto Val. At postion 2694- the T residue can be a G; although it does notchange the encoded amino acid. The polymorphism may be detected byrestriction enzymes AvaII or Sau9GI. These changes are present in thenormal population and show good correlation with haploytpes but not inCF disease.

There can be a G to A change for the last nucleotide of exon 10(nucleotide position 1716). We think that this nucleotide substitutionis a sequence polymorphism because (a) it does not alter the amino acid,(b) it is unlikely to cause a splicing defect and (c) it occurs on somenormal chromosomes. In two Canadian families, this rare allele is foundassociated with haplotype IIIb.

The more common mucleotide at 356 (G) is found to be changed to A in thefather's normal chromosome in family #54. The amino acid changes fromArg to Gln.

4.0 CFTR Protein

As discussed with respect to the DNA sequence of FIG. 1, analysis of thesequence of the overlapping cDNA clones predicted an unprocessedpolypeptide of 1480 amino acids with a molecular mass of 168,138daltons. As later described, due to polymorphisms in the protein, themolecular weight of the protein can vary due to possible substitutionsor deletion of certain amino acids. The molecular weight will alsochange due to the addition of carbohydrate units to form a glycoprotein.It is also understood that the functional protein in the cell will besimilar to the unprocessed polypeptide, but may be modified due to cellmetabolism.

Accordingly, purified normal CFTR polypeptide is characterized by amolecular weight of about 170,000 daltons and having epithelial calltransmembrane ion conductance activity. The normal CFTR polypeptide,which is substantially free of other human proteins, is encoded by theaforementioned DNA sequences and according to one embodiment, that ofFIG. 1. Such polypeptide displays the immunological or biologicalactivity of normal CFTR polypeptide. As will be later discussed, theCFTR polypeptide and fragments thereof may be made by chemical orenzymatic peptide synthesis or expressed in an appropriate cultured callsystem. The invention provides purified 507 mutant CFTR polypeptidewhich is characterized by cystic fibrosis-associated activity in humanepithelial cells. Such 507 mutant CFTR polypeptide, as substantiallyfree of other human proteins, can be encoded by the 507 mutant DNAsequence.

4.1 Structure of CFTR

The most characteristic feature of the predicted protein is the presenceof two repeated motifs, each of which consists of a set of amino acidresidues capable of spanning the membrane several times followed bysequence resembling consensus nucleotide (ATP)-binding folds (NBFs)(FIGS. 11, 12 and 15). These characteristics are remarkably similar tothose of the mammalian multidrug resistant P-glycoprotein and a numberof other membrane-associated proteins, thus implying that the predictedCF gene product is likely to be involved in the transport of substances(ions) across the membrane and is probably a member of a membraneprotein super family.

FIG. 13 is a schematic model of the predicted CFTR protein. In FIG. 13,cylinders indicate membrane spanning helices, hatched spheres indicateNBFs. The stippled sphere is the polar R-domain. The 6 membrane spanninghelices in each half of the molecule are depicted as cylinders. Theinner cytoplasmically oriented NBFs are shown as hatched spheres withslots to indicate the means of entry by the nucleotide. The large polarR-domain which links the two halves is represented by an stippledsphere. Charged individual amino acids within the transmembrane segmentsand on the R-domain surface are depicted as small circles containing thecharge sign. Net charges on the internal and external loops joining themembrane cylinders and on regions of the NBFs are contained in opensquares. Sites for phosphorylation by protein kinases A or C are shownby closed and open triangles respectively. K,R,H,D, and E are standardnomenclature for the amino acids, lysine, arginine, histidine, asparticacid and glutamic acid respectively.

Each of the predicted membrane-associated regions of the CFTR proteinconsists of 6 highly hydrophobic segments capable of spanning a lipidbilayer according to the algorithms of Kyte and Doolittle and of Garnieret al (J. Mol. Biol. 120, 97 (1978) (FIG. 13). The membrane-associatedregions are each followed by a large hydrophilic region containing theNBFs. Based on sequence alignment with other known nucleotide bindingproteins, each of the putative NBFs in CFTR comprises at least 150residues (FIG. 13). The 3 bp deletion at position 507 as detected in CFpatients is located between the 2 most highly conserved segments of thefirst NBF in CFTR. The amino acid sequence identity between the regionsurrounding the isoleucine deletion and the corresponding regions of anumber of other proteins suggests that this region is of functionalimportance (FIG. 15). A hydrophobic amino acid, usually one with anaromatic side chain, is present in most of these proteins at theposition corresponding to I507 of the CFTR protein. It is understoodthat amino acid polymorphisms may exist as a result of DNApolymorphisms. Similarly, mutations at the other positions in theprotein suggested that corresponding regions of the protein are also offunctional importance. Such additional mutations include substitutionsof:

i) Glu for Gly at amino acid position 85;

ii) Thr for Ile at amino acid position 148;

iii) Arg for Gly at amino acid position 178;

iv) Glu for ALA at amino position 455;

v) stop codon for Gln at amino acid postion 493;

vi) stop codon for Gly at amino acid position 542;

vii) Arg for Ser or Ile for Ser at amino acid position 549;

viii) Asp for Gly at amino acid position 551;

ix) Thr for Arg at amino acid position 560;

x) Asn for Tyr at amino acid position 563;

xi) His for Pro at amino acid position 574;

xii) Pro for Leu at amino acid position 1077;

xiii) Stop codon for Tyr at amino acid position 1092.

FIG. 15 shows alignment of the 3 most conserved segments of the extendedNBF's of CFTR with comparable regions of other proteins. These 3segments consist of residues 433-473, 488-513, and 542-584 of theN-terminal half and 1219-1259, 1277-1302, and 1340-1382 of theC-terminal half of CFTR. The heavy overlining points out the regions ofgreatest similarity. Additional general homology can be seen evenwithout the introduction of gaps.

Despite the overall symmetry in the structure of the protein and thesequence conservation of the NBFs, sequence homology between the twohalves of the predicted CFTR protein is modest. This is demonstrated inFIG. 12, where amino acids 1-1480 are represented on each axis. Lines oneither side of the identity diagonal indicate the positions of internalsimilarities. Therefore, while four sets of internal sequence identitycan be detected as shown in FIG. 12, using the Dayhoff scoring matrix asapplied by Lawrence et al. [C. B. Lawrence, D. A. Goldman, and R. T.Hood, Bull Math Biol. 48, 569 (1986)], three of these are only apparentat low threshold settings for standard deviation. The strongestidentity's between sequences at the carboxyl ends of the NBFs. Of the 66residues aligned 27% are identical and another 11% are functionallysimilar. The overall weak internal homology is in contrast to the muchhigher degree (>70%) in P-glycoprotein for which a gene duplicationhypothesis has been proposed (Gros et al, Cell 47, 371, 1986, C. Chen etal, Cell 47, 381, 1986, Gerlach et al, Nature, 324, 485, 1986, Gros etal, Mol. Cell. Biol. 8, 2770, 1988). The lack of conservation in therelative positions of the exon-intron boundaries may argue against sucha model for CFTR (FIG. 2).

Since there is apparently no signal-peptide sequence at theamino-terminus of CFTR, the highly charged hydrophilic segment precedingthe first transmembrane sequence is probably oriented in the cytoplasm.Each of the 2 sets of hydrophobic helices are expected to form 3transversing loops across the membrane and little sequence of the entireprotein is expected to be exposed to the exterior surface, except theregion between transmembrane segment 7 and 8. It is of interest to notethat the latter region contains two potential sites for N-linkedglycosylation.

Each of the membrane-associated regions is followed by a NBF asindicated above. In addition, a highly charged cytoplasmic domain can beidentified in the middle of the predicted CFTR polypeptide, linking the2 halves of the protein. This domain, named the R-domain, isoperationally defined by a single large exon in which 69 of the 241amino acids are polar residues arranged in alternating clusters ofpositive and negative charges. Moreover, 9 of the 10 consensus sequencesrequired for phosphosphorylation by protein kinase A (PKA), and, 7 ofthe potential substrate sites for protein kinase C (PKC) found in CFTRare located in this exon.

4.2 Function of CFTR

Properties of CFTR can be derived from comparison to othermembrane-associated proteins (FIG. 15). In addition to the overallstructural similarity with the mammalian P-glycoprotein, each of the twopredicted domains in CFTR also shows remarkable resemblance to thesingle domain structure of hemolysin B of E. coli and the product of theWhite gene of Drosophila. These latter proteins are involved in thetransport of the lytic peptide of the hemolysin system and of eyepigment molecules, respectively. The vitamin B12 transport system of E.coli, BtuD and MbpX which is a liverwort chloroplast gene whose functionis unknown also have a similar structural motif. Furthermore, the CFTRprotein shares structural similarity with several of the periplasmicsolute transport systems of gram negative bacteria where thetransmembrane region and the ATP-binding folds are contained in separateproteins which function in concert with a third substrate-bindingpolypeptide.

The overall structural arrangement of the transmembrane domains in CFTRis similar to several cation channel proteins and somecation-translocating ATPases as well as the recently described adenylatecyclase of bovine brain. The functional significance of this topologicalclassification, consisting of 6 transmembrane domains, remainsspeculative.

Short regions of sequence identity have also been detected between theputative transmembrane regions of CFTR and other membrane-spanningproteins. Interestingly, there are also sequences, 18 amino acids inlength situated approximately 50 residues from the carboxyl terminus ofCFTR and the raf serine/threonine kinase protooncogene of Xenopus laeviswhich are identical at 12 of these positions.

Finally, an amino acid sequence identity (10/13 conserved residues) hasbeen noted between a hydrophilic segment (position 701-713) within thehighly charged R-domain of CFTR and a region immediately preceding thefirst transmembrane loop of the sodium channels in both rat brain andeel. The charged R-domain of CFTR is not shared with the topologicallyclosely related P-glycoprotein; the 241 amino acid linking-peptide isapparently the major difference between the two proteins.

In summary, features of the primary structure of the CFTR proteinindicate its possession of properties suitable to participation in theregulation and control of ion transport in the epithelial cells oftissues affected in CF. Secure attachment to the membrane in two regionsserve to position its three major intracellular domains(nucleotide-binding folds 1 and 2 and the R-domain) near the cytoplasmicsurface of the cell membrane where they can modulate ion movementthrough channels formed either by CFTR transmembrane segments themselvesor by other membrane proteins.

In view of the genetic data, the tissue-specificity, and the predictedproperties of the CFTR protein, it is reasonable to conclude that CFTRis directly responsible for CF. It, however, remains unclear how CFTR isinvolved in the regulation of ion conductance across the apical membraneof epithelial cells.

It is possible that CFTR serves as an ion channel itself. As depicted inFIGS. 13, 10 of the 12 transmembrane regions contain one or more aminoacids with charged side chains, a property similar to the brain sodiumchannel and the GABA receptor chloride channel subunits, where chargedresidues are present in 4 of the 6, and 3 of the 4, respectivemembrane-associated domains per subunit or repeat unit. The amphipathicnature of these transmembrane segments is believed to contribute to thechannel-forming capacity of these molecules. Alternatively, CFTR may notbe an ion channel but instead serve to regulate ion channel activities.In support of the latter assumption, none of the purified polypeptidesfrom trachea and kidney that are capable of reconstituting chloridechannels in lipid membranes [Landry et al, Science 224: 1469 (1989)]appear to be CFTR if judged on the basis of the molecular mass.

In either case, the presence of ATP-binding domains in CFTR suggeststhat ATP hydrolysis is directly involved and required for the transportfunction. The high density of phosphorylation sites for PKA and PKC andthe clusters of charged residues in the R-domain may both serve toregulate this activity. The deletion of a phenylalanine residue in theNBF may prevent proper binding of ATP or the conformational change whichthis normally elicits and consequently result in the observedinsensitivity to activation by PKA- or PKC-mediated phosphorylation ofthe CF apical chloride conductance pathway. Since the predicted proteincontains several domains and belongs to a family of proteins whichfrequently function as parts of multi-component molecular systems, CFTRmay also participate in epithelial tissue functions of activity orregulation not related to ion transport.

With the isolated CF gene (cDNA) now in hand it is possible to definethe basic biochemical defect in CF and to further elucidate the controlof ion transport pathways in epithelial cells in general. Mostimportant, knowledge gained thus far from the predicted structure ofCFTR together with the additional information from studies of theprotein itself provide a basis for the development of improved means oftreatment of the disease. In such studies, antibodies have been raisedto the CFTR protein as later described.

5.0 CF Screening

5.1 DNA Based Diagnosis

Given the knowledge of the 85, 148, 178, 455, 493, 507, 542, 549, 551,560, 563, 574, 1077 and 1092 amino acid position mutations and thenucleotide sequence varients at DNA sequence positions 129, 556, 621+1,711+1, 1717-1 and 3659 as disclosed herein, carrier screening andprenatal diagnosis can be carried out as follows.

The high risk population for cystic fibrosis is Caucasians. For example,each Caucasian woman and/or man of child-bearing age would be screenedto determine if she or he was a carrier (approximately a 5% probabilityfor each individual). If both are carriers, they are a couple at riskfor a cystic fibrosis child. Each child of the at risk couple has a 25%chance of being affected with cystic fibrosis. The procedure fordetermining carrier status using the probes disclosed herein is asfollows.

For purposes of brevity, the discussion on screening by use of one ofthe selected mutations is directed to the I507 mutation. It isunderstood that screening can also be accomplished using one of theother mutations or using several of the mutations in a screening processor mutation detection process of this section on CF screening involvingDNA diagnosis and mutation detection.

One major application of the DNA sequence information of the normal and507 mutant CF gene is in the area of genetic testing, carrier detectionand prenatal diagnosis. Individuals carrying mutations in the CF gene(disease carrier or patients) may be detected at the DNA level with theuse of a variety of techniques. The genomic DNA used for the diagnosismay be obtained from body cells, such as those present in peripheralblood, urine, saliva, tissue biopsy, surgical specimen and autopsymaterial. The DNA may be used directly for detection of specificsequence or may be amplified enzymatically in vitro by using PCR [Saikiet al. Science 230: 1350-1353, (1985), Saiki et al. Nature 324: 163-166(1986)] prior to analysis. RNA or its cDNA form may also be used for thesame purpose. Recent reviews of this subject have been presented byCaskey, [Science 236: 1223-8 (1989) and by Landegren et al (Science 242:229-237 (1989)].

The detection of specific DNA sequence may be achieved by methods suchas hybridization using specific oligonucleotides [Wallace et al. ColdSpring Harbour Symp. Quant. Biol. 51: 257-261 (1986)], direct DNAsequencing [Church and Gilbert, Proc. Nat. Acad. Sci. U. S. A. 81:1991-1995 (1988)], the use of restriction enzymes [Flavell et al. Cell15: 25 (1978), Geever et al Proc. Nat. Acad. Sci. U. S. A. 78: 5081(1981)], discrimination on the basis of electrophoretic mobility in gelswith denaturing reagent (Myers and Maniatis, Cold Spring Harbour Sym.Quant. Biol. 51: 275-284 (1986)), RNase protection (Myers, R. M., Larin,J., and T. Maniatis Science 230: 1242 (1985)), chemical cleavage (Cottonet al Proc. Nat. Acad. Sci. U. S. A. 85: 4397-4401, (1985)) and theligase-mediated detection procedure [Landegren et al Science 241: 1077(1988)].

Oligonucleotides specific to normal or mutant sequences are chemicallysynthesized using commercially available machines, labelledradioactively with isotopes (such as ³² P) or non-radioactively (withtags such as biotin (Ward and Langer et al. Proc. Nat. Acad. Sci. U. S.A. 78: 6633-6657 (1981)), and hybridized to individual DNA samplesimmobilized on membranes or other solid supports by dot-blot or transferfrom gels after electrophoresis. The presence or absence of thesespecific sequences are visualized by methods such as autoradiography orfluorometric (Landegren et al, 1989, supra) or colorimetric reactions(Gebeyehu et a. Nucleic Acids Research 15: 4513-4534 (1987)). Anembodiment of this oligonucleotide screening method has been applied inthe detection of the I507 deletion as described herein.

Sequence differences between normal and mutants may be revealed by thedirect DNA sequencing method of Church and Gilbert (supra). Cloned DNAsegments may be used as probes to detect specific DNA segments. Thesensitivity of this method is greatly enhanced when combined with PCR[Wrichnik et al, Nucleic Acids Res. 15: 529-542 (1987); Wong et al,Nature 330: 384-386 (1987); Stoflet et al, Science 239: 491-494 (1988)].In the latter procedure, a sequencing primer which lies within theamplified sequence is used with double-stranded PCR product orsingle-stranded template generated by a modified PCR. The sequencedetermination is performed by conventional procedures with radiolabelednucleotides or by automatic sequencing procedures with fluorescent-tags.

Sequence alterations may occasionally generate fortuitous restrictionenzyme recognition sites which are revealed by the use of appropriateenzyme digestion followed by conventional gel-blot hybridization(Southern, J. Mol. Biol 98: 503 (1975)). DNA fragments carrying the site(either normal or mutant) are detected by their reduction in size orincrease of corresponding restriction fragment numbers. Genomic DNAsamples may also be amplified by PCR prior to treatment with theappropriate restriction enzyme; fragments of different sizes are thenvisualized under UV light in the presence of ethidiun bromide after gelelectrophoresis.

Genetic testing based on DNA sequence differences may be achieved bydetection of alteration in electrophoretic mobility of DNA fragments ingels with or without denaturing reagent. Small sequence deletions andinsertions can be visualized by high resolution gel electrophoresis. Forexample, the PCR product with the 3 bp deletion is clearlydistinguishable from the normal sequence on an 8% non-denaturingpolyacrylamide gel. DNA fragments of different sequence compositions maybe distinguished on denaturing formamide gradient gel in which themobilities of different DNA fragments are retarded in the gel atdifferent positions according to their specific "partial-melting"temperatures (Myers, supra). In addition, sequence alterations, inparticular small deletions, may be detected as changes in the migrationpattern of DNA heteroduplexes in non-denaturing gel electrophoresis, ashave been detected for the 3 bp (I507) mutation and in otherexperimental systems [Nagamine et al, Am. J. Hum. Genet, 45: 337-339(1989)]. Alternatively, a method of detecting a mutation comprising asingle base substitution or other small change could be based ondifferential primer length in a PCR. For example, one invariant primercould be used in addition to a primer specific for a mutation. The PCRproducts of the normal and mutant genes can then be differentiallydetected in acrylamide gels.

Sequence changes at specific locations may also be revealed by nucleaseprotection assays, such as RNase (Myers, supra) and S1 protection (Berk,A. J., and P. A. Sharpe Proc. Nat. Acad. Sci. U. S. A. 75: 1274 (1978)),the chemical cleavage method (Cotton, supra) or the ligase-mediateddetection procedure (Landegren supra).

In addition to conventional gel-electrophoresis and blot-hybridizationmethods, DNA fragments may also be visualized by methods where theindividual DNA samples are not immobilized on membranes. The probe andtarget sequences may be both in solution or the probe sequence may beimmobilized [Saiki et al, Proc. Natl. Acad. Sci USA, 86: 6230-6234(1989)]. A variety of detection methods, such as autoradiographyinvolving radioisotopes, direct detection of radioactive decay (in thepresence or absence of scintillant), spectrophotometry involvingcolorigenic reactions and fluorometry involving fluorogenic reactions,may be used to identify specific individual genotypes.

Since more than one mutation is anticipated in the CF gene such as I507and F508, a multiples system is an ideal protocol for screening CFcarriers and detection of specific mutations. For example, a PCR withmultiple, specific oligonucleotide primers and hybridization probes, maybe used to identify all possible mutations at the same time (Chamberlainat al. Nucleic Acids Research 16: 1141-1155 (1988)). The procedure mayinvolve immobilized sequence-specific oligonucleotides probes (Saiki etal, supra).

5.2 Detecting the CF 507 Mutation

These detection methods may be applied to prenatal diagnosis usingamniotic fluid cells, chorionic villi biopsy or sorting fetal cells frommaternal circulation. The test for CF carriers in the population may beincorporated as an essential component in a broad-scale genetic testingprogram for common diseases.

According to an embodiment of the invention, the portion of the DNAsegment that is informative for a mutation, such as the mutationaccording to this embodiment, that is, the portion that immediatelysurrounds the I507 deletion, can then be amplified by using standard PCRtechniques [as reviewed in Landegren, Ulf, Robert Kaiser, C. ThomasCaskey, and Leroy Hood, DNA Diagnostics--Molecular Techniques andAutomation, in Science 242: 229-237 (1988)]. It is contemplated that theportion of the DNA segment which is used may be a single DNA segment ora mixture of different DNA segments. A detailed description of thistechnique now follows.

A specific region of genomic DNA from the person or fetus is to bescreened. Such specific region is defined by the oligonucleotide primersC16B (5 'GTTTTCCTGGATTATGCCTGGCAC3') SEQ ID NO:9 and C16D(5'GTTGGCATGCTTTGATGACGCTTC3') SEQ ID NO:10 or as shown in FIG. 18 byprimers 10i-5 and 10i-3. The specific regions using 10i-5 and 10i-3 wereamplified by the polymerase chain reaction (PCR). 200-400 ng of genomicDNA, from either cultured lymphoblasts or peripheral blood samples of CFindividuals and their parents, were used in each PCR with theoligonucleotides primers indicated above. The oligonucleotides werepurified with Oligonucleotide Purification Cartridges™ (AppliedBiosystems) or NENSORB™ PREP columns (Dupont) with proceduresrecommended by the suppliers. The primers were annealed at 55° C. for 30sec, extended at 72° C. for 60 sec (with 2 units of Taq DNA polymerase)and denatured at 94° C. for 60 sec, for 30 cycles with a final cycle of7 min for extension in a Perkin-Elmer/Cetus automatic thermocycler witha Step-Cycle program (transition setting at 1.5 min). Portions of thePCR products were separated by electrophoresis on 1.4% agarose gels,transferred to Zetabind™; (Biorad) membrane according to standardprocedures.

The normal and ΔI507 oligonucleotide probes of FIG. 19 (10 ng each) arelabeled separately with 10 units of T4 polynucleotide kinase (Pharmacia)in a 10 μl reaction containing 50 mM Tris-HCl (pH7.6), 10 mM MgCl₂, 0.5mM dithiothreitol, 10 mM spermidine, 1 mM EDTA and 30-40 μCi of γ[³²P]-ATP for 20-30 min at 37° C. The unincorporated radionucleotides wereremoved with a Sephadex G-25 column before use. The hybridizationconditions were as described previously (J. M. Rommens et al Am. J. Hum.Genet. 43,645 (1988)) except that the temperature can be 37° C. Themembranes are washed twice at room temperature with 5×SSC and twice at39° C. with 2×SSC (1×SSC=150 mM NaCl and 15 mM Na citrate).Autoradiography is performed at room temperature overnight.Autoradiographs are developed to show the hybridization results ofgenomic DNA with the 2 specific oligonucleotide probes. Probe C normaldetects the normal DNA sequence and Probe C ΔI507 detects the mutantsequence.

Genomic DNA sample from each family member can, as explained, beamplified by the polymerase chain reaction using the intron sequences ofFIG. 18 and the products separated by electrophoresis on a 1.4% agarosegel and then transferred to Zetabind (Biorad) membrane according tostandard procedures. The 3 bp deletion of ΔI507 can be revealed by avery convenient polyacrylamide gel electrophoresis procedure. When thePCR products generated by the above-mentioned 10i-5 and 10i-3 primersare applied to an 5% polyacrylamide gal, electrophoresed for 3 hrs at 20V/cm in a 90 mM Tris-borate buffer (pH 8.3), DNA fragments of adifferent mobility are clearly detectable for individuals without the 3bp deletion, heterozygous or homozygous for the deletion.

As already explained with respect to FIG. 20, the PCR amplified genomicDNA can be subjected to gel electrophoresis to identify the 3 bpdeletion. As shown in FIG. 20, in the four lanes the first lane is acontrol with a normal/ΔF508 deletion. The next lane is the father with anormal/ΔI507 deletion. The third lane is the mother with a normal/ΔF508deletion and the fourth lane is the child with a ΔF508/ΔI507 deletion.The homoduplexes show up as solid bands across the base of each lane. Inlanes 1 and 3, the two heteroduplexes show up very clealy as two spacedapart bands. In lane 2, the father's ΔI507 mutation shows up veryclearly, whereas in the fourth lane, the child with the adjacent 507,508 mutations, there is no distinguishable heteroduplexes. Hence theshowing is at the homoduplex line. Since the father in lane 2 and themother in lane 3 show heteroduplex banding and the child does not,indicates either the child is normal or is a patient. This can be futherchecked if needed, such as in embryoic analysis by mixing the 507 and508 probes to determine the presence of the ΔI507 and ΔF508 mutations.

Similar alteration in gel mobility for heteroduplexes formed during PCRhas also been reported for experimental systems where small deletionsare involved (Nagamine et al supra). These mobility shifts may be usedin general as the basis for the non-radioactive genetic screening tests.

5.3 CF Screening Programs

It is appreciated that approximately 1% of the carriers can be detectedusing the specific ΔI507 probes of this particular embodiment of theinvention. Thus, if an individual tested is not a carrier using theΔI507 probes, their carrier status can not be excluded, they may carrysome other mutation, such as the ΔF508 as previously noted. However, ifboth the individual and the spouse of the individual tested are acarrier for the ΔI507 mutation, it can be stated with certainty thatthey are an at risk couple. The sequence of the gene as disclosed hereinis an essential prerequisite for the determination of the othermutations.

Prenatal diagnosis is a logical extension of carrier screening. A couplecan be identified as at risk for having a cystic fibrosis child in oneof two ways: if they already have a cystic fibrosis child, they areboth, by definition, obligate carriers of the defective CFTR gene, andeach subsequent child has a 25% chance of being affected with cysticfibrosis. A major advantage of the present invention eliminates the needfor family pedigree analysis, whereas, according to this invention, agene mutation screening program as outlined above or other similarmethod can be used to identify a genetic mutation that leads to aprotein with altered function. This is not dependent on priorascertainment of the family through an affected child. Fetal DNAsamples, for example, can be obtained, as previously mentioned, fromamniotic fluid cells and chorionic villi specimens. Amplification bystandard PCR techniques can then be performed on this template DNA.

If both parents are shown to be carriers with the ΔI507 deletion, theinterpretation of the results would be the following. If there ishybridization of the fetal DNA to the normal probe, the fetus will notbe affected with cystic fibrosis, although it may be a CF carrier (50%probability for each fetus of an at risk couple). If the fetal DNAhybridizes only to the ΔI507 deletion probe and not to the normal probe,the fetus will be affected with cystic fibrosis.

It is appreciated that for this and other mutations in the CF gene, arange of different specific procedures can be used to provide a completediagnosis for all potential CF carriers or patients. A completedescription of these procedures is later described.

The invention therefore provides a method and kit for determining if asubject is a CF carrier or CF patient. In summary, the screening methodcomprises the steps of:

providing a biological sample of the subject to be screened; andproviding an assay for detecting in the biological sample, the presenceof at least a member from the group consisting of a 507 mutant CF gene,507 mutant CF gene products and mixtures thereof.

The method may be further characterized by including at least one morenucleotide probe which is a different DNA sequence fragment of, forexample, the DNA of FIG. 1, or a different DNA sequence fragment ofhuman chromosome 7 and located to either side of the DNA sequence ofFIG. 1. In this respect, the DNA fragments of the intron portions ofFIG. 2 are useful in further confirming the presence of the mutation.Unique aspects of the introns at the exon boundaries may be relied uponin screening procedures to further confirm the presence of the mutationat the I507 position or othe mutant positions.

A kit, according to an embodiment of the invention, suitable for use inthe screening technique and for assaying for the presence of the mutantCF gene by an immunoassay comprises:

(a) an antibody which specifically binds to a gene product of the mutantCF gene having a mutation at one of the amino acid positions of 85, 148,178, 455, 493, 507, 542, 549, 551, 560, 563, 574, 1077 and 1092 SEQ IDNO:2;

(b) reagent means for detecting the binding of the antibody to the geneproduct; and

(c) the antibody and reagent means each being present in amountseffective to perform the immunoassay.

The kit for assaying for the presence for the mutant CF gene may also beprovided by hybridization techniques. The kit comprises:

(a) an oligonucleotide probe which specifically binds to the mutant CFgene encoding a peptide having a mutation at one of the amino acidpositions 85, 148, 178, 455, 493, 507, 542, 549, 551, 560, 563, 574,1077 and 1092 SEQ ID NO:2;

(b) reagent means for detecting the hybridization of the oligonucleotideprobe to the mutant CF gene; and

(c) the probe and reagent means each being present in amounts effectiveto perform the hybridization assay.

5.4 Antibodies to Detect Mutant CFTR

As mentioned, antibodies to epitopes within the mutant CFTR protein atpositions 85, 148, 178, 455, 493, 507, 542, 549, 551, 560, 563, 574,1077 and 1092 SEQ ID NO:2 are raised to provide extensive information onthe characteristics of the mutant protein and other valuable informationwhich includes:

1. The antibodies can be used to provide another technique in detectingany of the other CF mutations which result in the synthesis of a proteinwith an altered size.

2. Antibodies to distinct domains of the mutant protein can be used todetermine the topological arrangement of the protein in the cellmembrane. This provides information on segments of the protein which areaccessible to externally added modulating agents for purposes of drugtherapy.

3. The structure-function relationships of portions of the protein canbe examined using specific antibodies. For example, it is possible tointroduce into cells antibodies recognizing each of the chargedcytoplasmic loops which join the transmembrane sequences as well asportions of the nucleotide binding folds and the R-domain. The influenceof these antibodies on functional parameters of the protein provideinsight into cell regulatory mechanisms and potentially suggest means ofmodulating the activity of the defective protein in a CF patient.

4. Antibodies with the appropriate avidity also enableimmunoprecipitation and immuno-affinity purification of the protein.Immunoprecipitation will facilitate characterization of synthesis andpost translational modification including ATP binding andphosphorylation. Purification will be required for studies of proteinstructure and for reconstitution of its function, as well as proteinbased therapy.

In order to prepare the antibodies, fusion proteins containing definedportions of anyone of the mutant CFTR polypeptides can be synthesized inbacteria by expression of corresponding mutant DNA sequence in asuitable cloning vehicle. Smaller peptide may be synthesized chemically.The fusion proteins can be purified, for example, by affinitychromatography on glutathione-agarose and the peptides coupled to acarrier protein (hemocyanin), mixed with Freund's adjuvant and injectedinto rabbits. Following booster injections at bi-weekly intervals, therabbits are bled and sera isolated. The developed polyclonal antibodiesin the sera may then be combined with the fusion proteins. Immunoblotsare then formed by staining with, for example, alkaline-phosphataseconjugated second antibody in accordance with the procedure of Blake etal, Anal. Biochem. 136: 175 (1984).

Thus, it is possible to raise polyclonal antibodies specific for bothfusion proteins containing portions of the mutant CFTR protein andpeptides corresponding to short segments of its sequence. Similarly,nice can be injected with KLH conjugates of peptides to initiate theproduction of monoclonal antibodies to corresponding segments of mutantCFTR protein.

As for the generation of monoclonal antibodies, immunogens for theraising of monoclonal antibodies (mAbs) to the mutant CFTR protein arebacterial fusion proteins [Smith et al, Gene 67: 31 (1988)] containingportions of the CFTR polypeptide or synthetic peptides corresponding toshort (12 to 25 amino acids in length) segments of the mutant sequence.The essential methodology is that of Kohler and Milstein [Nature 256:495 (1975)].

Balb/c mice are immunized by intraperitoneal injection with 500 μg ofpure fusion protein or synthetic peptide in incomplete Freund'sadjuvant. A second injection is given after 14 days, a third after 21days and a fourth after 28 days. Individual animals so immunized aresacrificed one, two and four weeks following the final injection.Spleens are removed, their cells dissociated, collected and fused withSp2/O-Ag14 myeloma cells according to Gefter et al, Somatic CellGenetics 3: 231 (1977). The fusion mixture is distributed in culturemedium selective for the propagation of fused cells which are grownuntil they are about 25% confluent. At this time, culture supernatantsare tested for the presence of antibodies reacting with a particularCFTR antigen. An alkaline phosphatase labelled anti-mouse secondantibody is then used for detection of positives. Cells from positiveculture wells are then expanded in culture, their supernatants collectedfor further testing and the cells stored deep frozen incryoprotectant-containing medium. To obtain large quantities of a mAb,producer cells are injected into the peritoneum at 5×10⁶ cells peranimal, and ascites fluid is obtained. Purification is by chromotographyon Protein G- or Protein A-agarose according to Ey et al,Immunochemistry 15: 429 (1977).

Reactivity of these mAbs with the mutant CFTR protein can be confirmedby polyacrylamide gel electrophoresis of membranes isolated fromepithelial cells in which it is expressed and immunoblotted [Towbin etal, Proc. Natl. Acad. Sci. USA 76: 4350 (1979)].

In addition to the use of monoclonal antibodies specific for theparticular mutant domain of the CFTR protein to probe their individualfunctions, other mABs, which can distinguish between the normal andmutant forms of CFTR protein, are used to detect the mutant protein inepithelial cell samples obtained from patients, such as nasal mucosabiopsy "brushings" [R. De-Lough and J. Rutland, J. Clin. Pathol. 42, 613(1989)] or skin biopsy specimens containing sweat glands.

Antibodies capable of this distinction are obtained by differentiallyscreening hybridomas from paired sets of mice immunized with a peptidecontaining, for example, the isoleucine at amino acid position 507 (e.g.GTIKENIIFGVSY) SEQ ID NO:32 or a peptide which is identical except forthe absence of I507 (GTIKENIFGVSY) SEQ ID NO:33. mAbs capable ofrecognizing the other mutant forms of CFTR protein present in patientsin addition or instead of I507 deletion are obtained using similarmonoclonal antibody production strategies.

Antibodies to normal and CF versions of CFTR protein and of segmentsthereof are used in diagnostically immunocytochemical andimmunofluorescance light microscopy and immunoelectron microscopy todemonstrate the tissue, cellular and subcellular distribution of CFTRwithin the organs of CF patients, carriers and non-CF individuals.

Antibodies are used to therapeutically modulate by promoting theactivity of the CFTR protein in CF patients and in cells of CF patients.Possible modes of such modulation might involve stimulation due tocross-linking of CFTR protein molecules with multivalent antibodies inanalogy with stimulation of some call surface membrane receptors, suchas the insulin receptor [O'Brien at al, Euro. Mol. Biol. Organ. J. 6:4003 (1987)], epidermal growth factor receptor [Schreiber et al, J.Biol. Chem. 258: 846 (1983)] and T-cell receptor-associated moleculessuch as CD4 [Veillette et al Nature, 338: 257 (1989)].

Antibodies are used to direct the delivery of therapeutic agents to thecells which express defective CFTR protein in CF. For this purpose, theantibodies are incorporated into a vehicle such as a liposome [Matthayet al, Cancer Res. 46: 4904 (1986)] which carries the therapeutic agentsuch as a drug or the normal gene.

5.5 RFLP Analysis

DNA diagnosis is currently being used to assess whether a fetus will beborn with cystic fibrosis, but historically this has only been doneafter a particular set of parents has already had one cystic fibrosischild which identifies them as obligate carriers. However, incombination with carrier detection as outlined above, DNA diagnosis forall pregnancies of carrier couples will be possible. If the parents havealready had a cystic fibrosis child, an extended haplotype analysis canbe done on the fetus and thus the percentage of false positive or falsenegative will be greatly reduced. If the parents have not already had anaffected child and the DNA diagnosis on the fetus is being performed onthe basis of carrier detection, haplotype analysis can still beperformed.

Although it has been thought for many years that there is a great dealof clinical heterogeneity in the cystic fibrosis disease, it is nowemerging that there are two general categories, called pancreaticsufficiency (CF-PS) and pancreatic insufficiency (CF-PI). If themutations related to these disease categories are well characterized,one can associate a particular mutation with a clinical phenotype of thedisease. This allows changes in the treatment of each patient. Thus thenature of the mutation will to a certain extent predict the prognosis ofthe patient and indicate a specific treatment.

6.0 Molecular Biology of Cystic Fibrosis

The postulate that CFTR may regulate the activity of ion channels,particularly the outwardly rectifying C1 channel implicated as thefunctional defect in CF, can be tested by the injection and translationof full length in vitro transcribed CFTR mRNA in Xenopus oocytes. Theensuing changes in ion currents across the oocyte membrane can bemeasured as the potential is clamped at a fixed value. CFTR may regulateendogenous oocyte channels or it may be necessary to also introduceepithelial cell RNA to direct the translation of channel proteins. Useof mRNA coding for normal and for mutant CFTR, as provided by thisinvention, makes these experiments possible.

Other modes of expression in heterologous cell system also facilitatedissection of structure-function relationships. The complete CFTR DNAsequence ligated into a plasmid expression vector is used to transfactcells so that its influence on ion transport can be assessed. Plasmidexpression vectors containing part of the normal CFTR sequence alongwith portions of modified sequence at selected sites can be used invitro mutagenesis experiments performed in order to identify thoseportions of the CFTR protein which are crucial for regulatory function.

6.1 Expression of the Mutant DNA Sequence

The mutant DNA sequence can be manipulated in studies to understand theexpression of the gene and its product, and, to achieve production oflarge quantities of the protein for functional analysis, antibodyproduction, and patient therapy. The changes in the sequence may or maynot alter the expression pattern in terms of relative quantities,tissue-specificity and functional properties. The partial or full-lengthcDNA sequences, which encode for the subject protein, unmodified ormodified, may be ligated to bacterial expression vectors such as thepRIT (Nilsson et al. EMBO J. 4: 1075-1080 (1985)), pGEX (Smith andJohnson, Gene 67: 31-40 (1988)) or pATH (Spindler et al. J. Virol. 49:132-141 (1984)) plasmids which can be introduced into E. coli cells forproduction of the corresponding proteins which may be isolated inaccordance with the previously discussed protein purificationprocedures. The DNA sequence can also be transferred from its existingcontext to other cloning vehicles, such as other plasmids,bacteriophages, cosmids, animal virus, yeast artificial chromosomes(YAC) (Burke et al. Science 236: 806-812, (1987)), somatic cells, andother simple or complex organisms, such as bacteria, fungi (Timberlakeand Marshall, Science 244: 1313-1317 (1989), invertebrates, plants(Gasser and Fraley, Science 244: 1293 (1989), and pigs (Pursel et al.Science 244: 1281-1288 (1989)).

For expression in mammalian cells, the cDNA sequence may be ligated toheterologous promoters, such as the simian virus (SV) 40, promoter inthe pSV2 vector [Mulligan and Berg, Proc. Natl. Acad. Sci USA, 78:2072-2076 (1981)] and introduced into cells, such as monkey COS-1 cells[Gluzman, Cell, 23: 175-182 (1981)], to achieve transient or long-termexpression. The stable integration of the chimeric gene construct may bemaintained in mamialian cells by biochemical selection, such as neomycin[Southern and Berg, J. Mol. Appln. Genet. 1: 327-341 (1982)] andmycophoenolic acid [Mulligan and Berg, supra].

PNA sequences can be manipulated with standard procedures such asrestriction enzyme digestion, fill-in with DNA polymerase, deletion byexonuclease, extension by terminal deoxynucleotide transferase, ligationof synthetic or cloned DNA sequences, site-directed sequence-alterationvia single-stranded bacteriophage intermediate or with the use ofspecific oligonucleotides in combination with PCR.

The cDNA sequence (or portions derived from it), or a mini gene (a cDNAwith an intron and its own promoter) is introduced into eukaryoticexpression vectors by conventional techniques. These vectors aredesigned to permit the transcription of the cDNA in eukaryotic cells byproviding regulatory sequences that initiate and enhance thetranscription of the cDNA and ensure its proper splicing andpolyadenylation. Vectors containing the promoter and enhancer regions ofthe simian virus (SV)40 or long terminal repeat (LTR) of the RousSarcoma virus and polyadenylation and splicing signal from SV 40 arereadily available [Mulligan et al Proc. Natl. Acad. Sci. USA 78:1078-2076, (1981); Gorman et al Proc Natl. Acad. Sci USA 79: 6777-6781(1982)]. Alternatively, the CFTR endogenous promoter may be used. Thelevel of expression of the cDNA can be manipulated with this type ofvector, either by using promoters that have different activities (forexample, the baculovirus pAC373 can express cDNAs at high levels in S.frungiperda cells [M. D. Supers and G. E. Smith in, Genetically AlteredViruses and the Environment (B. Fields, et al, eds.) vol. 22 no 319-328,Cold Spring Harbour Laboratory Press, Cold Spring Harbour, N.Y., 1985)or by using vectors that contain promoters amenable to modulation, forexample the glucocorticoid-responsive promoter from the mouse mammarytumor virus [Lee et al, Nature 294: 228 (1982)]. The expression of thecDNA can be monitored in the recipient cells 24 to 72 hours afterintroduction (transient expression).

In addition, some vectors contain selectable markers [such as the gpt[Mulligan et Berg supra] or neo [Southern and Berg J. Mol. Appln. Genet1: 327-341 (1982)] bacterial genes that permit isolation of cells, bychemical selection, that have stable, long term expression of thevectors (and therefore the cDNA) in the recipient cell. The vectors canbe maintained in the cells as episomal, freely replicating entities byusing regulatory elements of viruses such as papilloma [Sarver et alMol. Cell Biol, 1: 486 (1981)] or Epstein-Barr (Sugden et al Mol. CellBiol. 5: 410 (1985)]. Alternatively, one can also produce cell linesthat halve integrated the vector into genomic DNA. Both of these typesof cell lines produce the gene product on a continuous basis. One canalso produce cell lines that have amplified the number of copies of thevector (and therefore of the cDNA as well) to create cell lines that canproduce high levels of the gene product [Alt et al. J. Biol. Chem. 253:1357 (1978)].

The transfer of DNA into eukaryotic, in particular human or othermammalian cells is now a conventional technique. The vectors areintroduced into the recipient cells as pure DNA (transfection) by, forexample, precipitation with calcium phosphate [Graham and vander Eb,Virology 52: 466 (1973) or strontium phosphate [Brash et al Mol. CellBiol. 7: 2013 (1987)], electroporation [Neumann et al EMBO J 1: 841(1982)], lipofection [Felgner et al Proc Natl. Acad. Sci USA 84: 7413(1987)], DEAE dextran [McCuthan et al J. Natl Cancer Inst. 41: 3511968)], microinjection [Mueller et al Cell 15: 579 1978)], protoplastfusion [Schafner, Proc Natl. Aca. Sci USA 72: 2163] or pellet guns[Klein et al, Nature 327: 70 (1987)]. Alternatively, the cDNA can beintroduced by infection with virus vectors. Systems are developed thatuse, for example, retroviruses (Bernstein et al. Genetic Engineering 7:235, (1985)], adenoviruses [Ahmad et al J. Virol 57: 267 (1986)] orHerpes virus [Spaete et al Cell 30: 295 (1982)].

These eukaryotic expression systems can be used for many studies of themutant CF gene and the mutant CFTR product, such as at protein positions85, 148, 178, 455, 493, 507, 542, 549, 551, 560, 563, 574, 1077 and1092. These include, for example: (1) determination that the gene isproperly expressed and that all post-translational modificationsnecessary for full biological activity have been properly completed (2)identify regulatory elements located in the 5' region of the CF gene andtheir role in the tissue- or temporal-regulation of the expression ofthe CF gene (3) production of large amounts of the normal protein forisolation and purification (4) to use cells expressing the CFTR proteinas an assay system for antibodies generated against the CFTR protein oran assay system to test the effectiveness of drugs, (5) study thefunction of the normal complete protein, specific portions of theprotein, or of naturally occurring or artificially produced mutantproteins. Naturally occurring mutant proteins exist in patients with CFwhile artificially produced mutant protein can be designed by sitedirected sequence alterations. These latter studies can probe thefunction of any desired amino acid residue in the protein by mutatingthe nucleotides coding for that amino acid.

Using the above techniques, the expression vectors containing the mutantCF gene sequence or fragments thereof can be introduced into humancells, mammalian cells from other species or non-mammalian cells asdesired. The choice of cell is determined by the purpose of thetreatment. For example, one can use monkey COS cells [Gluzman, Cell 23:175 (1981)], that produce high levels of the SV40 T antigen and permitthe replication of vectors containing the SV40 origin of replication,can be used to show that the vector can express the protein product,since function is not required. Similar treatment could be performedwith Chinese hamster ovary (CHO) or mouse NIH 3T3 fibroblasts or withhuman fibroblasts or lymphoblasts.

The recombinant cloning vector, according to this invention, thencomprises the selected DNA of the DNA sequences of this invention forexpression in a suitable host. The DNA is operatively linked in thevector to an expression control sequence in the recombinant DNA moleculeso that normal CFTR polypeptide can be expressed. The expression controlsequence may be selected from the group consisting of sequences thatcontrol the expression of genes of prokaryotic or eukaryotic cells andtheir viruses and combinations thereof. The expression control sequencemay be specifically selected from the group consisting of the lacsystem, the trp system, the tac system, the trc system, major operatorand promoter regions of phage lambda, the control region of fd coatprotein, the early and late promoters of SV40, promoters derived frompolyoma, adenovirus, retrovirus, baculovirus and simian virus, thepromoter for 3-phosphoglycerate kinase, the promoters of yeast acidphosphatase, the promoter of the yeast alpha-mating factors andcombinations thereof.

The host cell, which may be transfected with the vector of thisinvention, may be selected from the group consisting of E. coli,Pseudomonas, Bacillus subtilis, Bacillus stearothermophilus or otherbacili; other bacteria; yeast; fungi; insect; mouse or other animal; orplant hosts; or human tissue calls.

It is appreciated that for the mutant DNA sequence similar systems areemployed to express and produce the mutant product.

6.2 Protein Function Considerations

To study the function of the mutant CFTR protein, it is preferable touse epithelial cells as recipients, since proper functional expressionmay require the presence of other pathways or gene products that areonly expressed in such cells. Cells that can be used include, forexample, human epithelial cell lines such as T84 (ATCC #CRL 248) orPANC-1 (ATCC #CLL 1469), or the T43 immortalized CF nasal epitheliumcell line [Jettan et al, Science (1989)] and primary [Yanhoskes at al.Ann. Rev. Resp. Dis. 132: 1281 (1985)] or transformed [Scholte et al.Exp. Cell. Res. 182: 559(1989)] human nasal polyp or airways cells,pancreatic cells [Harris and Coleman J. Cell. Sci. 87: 695 (1987)], orsweat gland cells [Collie et al. In Vitro 21: 597 (1985)] derived fromnormal or CF subjects. The CF cells can be used to test for thefunctional activity of mutant CF genes. Current functional assaysavailable include the study of the movement of anions (C1 or I) acrosscell membranes as a function of stimulation of cells by agents thatraise intracellular AMP levels and activate chloride channels [Stutto etal. Proc. Nat. Acad. Sci. U. S. A. 82: 6677 (1985)]. Other assaysinclude the measurement of changes in cellular potentials by patchclamping of whole cells or of isolated membranes [Frizzell et al.Science 233; 558 (1986), Welsch and Liedtke Nature 322: 467 (1986)] orthe study of ion fluxes in epithelial sheets of confluent cells[Widdicombe et al. Proc. Nat. Acad. Sci. 82: 6167 (1985)].Alternatively, RNA made from the CF gene could be injected into Xenopusoocytes. The oocyte will translate RNA into protein and allow its study.As other more specific assays are developed these can also be used inthe study of transfected mutant CFTR protein function.

"Domain-switching" experiments between mutant CFTR and the humanmultidrug resistance P-glycoprotein can also be performed to further thestudy of the mutant CFTR protein. In these experiments, plasmidexpression vectors are constructed by routine techniques from fragmentsof the mutant CFTR sequence and fragments of the sequence ofP-glycoprotein ligated together by DNA ligase so that a proteincontaining the respective portions of these two proteins will besynthesized by a host cell transfected with the plasmid. The latterapproach has the advantage that many experimental parameters associatedwith multidrug resistance can be measured. Hence, it is now possible toassess the ability of segments of mutant CFTR to influence theseparameters.

These studies of the influence of mutant CFTR on ion transport willserve to bring the field of epithelial transport into the moleculararena.

6.3 Therapies

It is understood that the major aim of the various biochemical studiesusing the compositions of this invention is the development of therapiesto circumvent or overcome the CF defect, using both the pharmacologicaland the "gene-therapy" approaches.

In the pharmacological approach, drugs which circumvent or overcome theCF defect are sought. Initially, compounds may be tested essentially atrandom, and screening systems are required to discriminate among manycandidate compounds. This invention provides host cell systems,expressing various of the mutant CF genes, which are particularly wellsuited for use as first level screening systems. Preferably, a callculture system using mammalian cells (most preferably human calls)transfected with an expression vector comprising a DNA sequence codingfor CFTR protein containing a CF-generating mutation, for example theI507 deletion, is used in the screening process. Candidate drugs aretested by incubating the cells in the presence of the candidate drug andmeasuring those cellular functions dependent on CFTR, especially bymeasuring ion currents where the transmembrane potential is clamped at afixed value. To accommodate the large number of assays, however, moreconvenient assays are based, for example, on the use of ion-sensitivefluorescent dyes. To detect changes in Cl^(-i) on concentration SPQ orits analogues are useful.

Alternatively, a cell-free system could be used. Purified CFTR could bereconstituted into articifial membranes and drugs could be screened in acell-free assay [Al-Aqwatt, Science, (1989)].

At the second level, animal testing is required. It is possible todevelop a model of CF by interfering with the normal expression of thecounterpart of the CF gene in an animal such as the mouse. The"knock-out" of this gene by introducing a mutant form of it into thegerm line of animals will provide a strain of animals with CF-likesyndromes. This enables testing of drugs which showed a promise in thefirst level cell-based screen.

As further knowledge is gained about the nature of the protein and itsfunction, it will be possible to predict structures of proteins or othercompounds that interact with the CFTR protein. That in turn will allowfor certain predictions to be made about potential drugs that willinteract with this protein and have some effect on the treatment of thepatients. Ultimately such drugs may be designed and synthesizedchemically on the basis of structures predicted to be required tointeract with domains of CFTR. This approach is reviewed in Capsey andDelvatte, Genetically Engineered Human Therapeutic Drugs Stockton Press,New York, 1988. These potential drugs must also be tested in thescreening system.

6.3.1 Protein Replacement Therapy

Treatment of CF can be performed by replacing the defective protein withnormal protein, by modulating the function of the defective protein orby modifying another step in the pathway in which CFTR participates inorder to correct the physiological abnormality.

To be able to replace the defective protein with the normal version, onemust have reasonably large amounts of pure CFTR protein. Pure proteincan be obtained as described earlier from cultured cell systems.Delivery of the protein to the affected airways tissue will require itspackaging in lipid-containing vesicles that facilitate the incorporationof the protein into the cell membrane. It may also be feasible to usevehicles that incorporate proteins such as surfactant protein, such asSAP(Val) or SAP(Phe) that performs this function naturally, at least forlung alveolar cells. (PCT Patent Application WO/8803170, Whitsett et al,May 7, 1988 and PCT Patent Application WO89/04327, Benson et al, May 18,1989). The CFTR-containing vesicles are introduced into the airways byinhalation or irrigation, techniques that are currently used in CFtreatment (Boat et al, supra).

6.3.2 Drug Therapy

Modulation of CFTR function can be accomplished by the use oftherapeutic agents (drugs). These can be identified by random approachesusing a screening program in which their effectiveness in modulating thedefective CFTR protein is monitored in vitro. Screening programs can usecultured cell systems in which the defective CFTR protein is expressed.Alternatively, drugs can be designed to modulate CFTR activity fromknowledge of the structure and function correlations of CFTR protein andfrom knowledge of the specific defect in the CFTR mutant protein (Capseyand Delvatte, supra). It is possible that the mutant CFTR protein willrequire a different drug for specific modulation. It will then benecessary to identify the specific mutation(s) in each CF patient beforeinitiating drug therapy.

Drugs can be designed to interact with different aspects of CFTR proteinstructure or function. For example, a drug (or antibody) can bind to astructural fold of the protein to correct a defective structure.Alternatively, a drug might bind to a specific functional residue andincrease its affinity for a substrate or cofactor. Since it is knownthat members of the class of proteins to which CFTR has structuralhomology can interact, bind and transport a variety of drugs, it isreasonable to expect that drug-related therapies may be effective intreatment of CF.

A third mechanism for enhancing the activity of an effective drug wouldbe to modulate the production or the stability of CFTR inside the cell.This increase in the amount of CFTR could compensate for its defectivefunction.

Drug therapy can also be used to compensate for the defective CFTRfunction by interactions with other components of the physiological orbiochemical pathway necessary for the expression of the CFTR function.These interactions can lead to increases or decreases in the activity ofthese ancillary proteins. The methods for the identification of thesedrugs would be similar to those described above for CFTR-related drugs.

In other genetic disorders, it has been possible to correct for theconsequences of altered or missing normal functions by use of dietarymodifications. This has taken the form of removal of metabolites, as inthe case of phenylketonuria, where phenylalanine is removed from thediet in the first five years of life to prevent mental retardation, orby the addition of large amounts of metabolites to the diet, as in thecase of adenosime deaminase deficiency where the functional correctionof the activity of the enzyme can be produced by the addition of theenzyme to the diet. Thus, once the details of the CFTR function havebeen elucidated and the basic defect in CF has been defined, therapy maybe achieved by dietary manipulations.

The second potential therapeutic approach is so-called "gene-therapy" inwhich normal copies of the CF gene are introduced in to patients so asto successfully code for normal protein in the key epithelial cells ofaffected tissues. It is most crucial to attempt to achieve this with theairway epithelial cells of the respiratory tract. The CF gene isdelivered to these cells in form in which it can be taken up and codefor sufficient protein to provide regulatory function. As a result, thepatient's quality and length of life will be greatly extended.Ultimately, of course, the aim is to deliver the gene to all affectedtissues.

6.3.3 Gene Therapy

One approach to therapy of CF is to insert a normal version of the CFgene into the airway epithelium of affected patients. It is important tonote that the respiratory system is the primary cause of mordibity andmortality in CF; while pancreatic disease is a major feature, it isrelatively well treated today with enzyme supplementation. Thus, somaticcell gene therapy [for a review, see T. Friedmann, Science 244: 1275(1989)] targeting the airway would alleviate the most severe problemsassociated with CF.

A. Retroviral Vectors. Retroviruses have been considered the preferredvector for experiments in somatic gene therapy, with a high efficiencyof infection and stable integration and expression [Orkin et al Prog.Med. Genet 7: 130, (1988)]. A possible drawback is that cell division isnecessary for retroviral integration, so that the targeted cells in theairway may have to be nudged into the cell cycle prior to retroviralinfection, perhaps by chemical means. The full length CF gene cDNA canbe cloned into a retroviral vector and driven from either its endogenouspromoter or from the retroviral LRT (long terminal repeat). Expressionof levels of the normal protein as low as 10% of the endogenous mutantprotein in CF patients would b expected to be beneficial, since this isa recessive disease. Delivery of the virus could be accomplished byaerosol or instillation into the trachea.

B. Other Viral Vectors. Other delivery systems which can be utilizedinclude adeno-associated virus [AAV, McLaughlin et al, J. Virol 62: 1963(1988)], vaccinia virus [Moss et al Annu. Rev. Immunol, 5: 305, 1987)],bovine papilloma virus [Rasmussen at al, Methods Enzymol 139: 642(1987)] or member of the herpesvirus group such as Epstein-Barr virus(Margolskee at al Mol. Cell. Biol 8: 2937 (1988)]. Though much wouldneed to be learned about their basic biology, the idea of using a viralvector with natural tropism for the respiratory track (e.g. respiratorysyncytial virus, echovirus, Coxsackie virus, etc.) is possible.

C. Non-viral Gene Transfer. Other methods of inserting the CF gene intorespiratory epithelium may also be productive; many of these are lowerefficiency and would potentially require infection in vitro, selectionof transfectants, and reimplantation. This would include calciumphosphate, DEAE dextran, electroporation, and protoplast fusion. Aparticularly attractive idea is the use of liposome, which might bepossible to carry out in vivo [Ostro, Liposomes, Marcel-Dekker, 1987].Synthetic cationic lipids such as DOTMA [Felger et al Proc. Natl. Acad.Sci USA 84: 7413 (1987)] may increase the efficiency and ease ofcarrying out this approach.

6.4 CF Animal Models

The creation of a mouse or other animal model for CF will be crucial tounderstanding the disease and for testing of possible therapies (forgeneral review of creating animal models, see Erickson, Am. J. Hum.Genet 43: 582 (1988)]. Currently no animal model of the CF exists. Theevolutionary conservation of the CF gene (as demonstrated by thecross-species hybridization blots for E4.3 and H1.6), as is shown inFIG. 4, indicate that an orthologous gene exists in the mouse (hereafterto be denoted mCF, and its corresponding protein as mCFTR), and thiswill be possible to clone in mouse genomic and cDNA libraries using thehuman CF gene probes. It is expected that the generation of a specificmutation in the mouse gene analogous to the I507 mutation will be mostoptimum to reproduce the phenotype, though complete inactivation of themCFTR gene will also be a useful mutant to generate.

A. Mutagenesis. Inactivation of the mCF gene can be achieved by chemical[e.g. Johnson et al Proc. Natl. Acad. Sci. USA 78: 3138 (1981)] or X-raymutagenesis [Popp at al J. Mol. Biol. 127: 141 (1979)] of mouse gametes,followed by fertilization. Offspring heterozygous for inactivation ofmCPTR can then be identified by Southern blotting to demonstrate loss ofone allele by dosage, or failure to inherit one parental allele if anRFLP marker is being assessed. This approach has previously beensuccessfully used to identify mouse mutants for α-globin [Whitney et alProc. Natl. Acad. Sci. USA 77: 1087 (1980)], phenylalanine hydroxylase[McDonald et al Pediatr. Res 23: 63 (1988)], and carbonic anhydrase II[Lewis et al Proc. Natl. Acad. Sci. USA 85: 1962, (1988)].

B. Transgenics A mutant version of CFTR or mouse CFTR can be insertedinto the mouse germ line using now standard techniques of oocyteinjection [Camper, Trends in Genetics (1988)]; alternatively, if it isdesirable to inactivate or replace the endogenous mCF gene, thehomologous recombination system using embryonic stem (ES) calls[Capecchi, Science 244: 1288 (1989)] may be applied.

1. Oocyte Injection Placing one or more copies of the normal or mutantmCF gene at a random location in the mouse germline can be accomplishedby microinjection of the pronucleus of a just-fertilized mouse oocyte,followed by reimplantation into a pseudo-pregnant foster mother. Theliveborn mice can then be screened for integrants using analysis of tailDNA for the presence of human CF gene sequences. The same protocol canbe used to insert a mutant mCF gene. To generate a mouse model, onewould want to place this transgene in a mouse background where theendogenous mCF gene has been inactivated, either by mutagenesis (seeabove) or by homologous recombination (see below). The transgene can beeither: a) a complete genomic sequence, though the size of this (about250 kb) would require that it be injected as a yeast artificialchromosome or a chromosome fragment; b) a cDNA with either the naturalpromoter or a heterologous promoter; c) a "minigene" containing all ofthe coding region and various other elements such as introns, promoter,and 3' flanking elements found to be necessary for optimum expression.

2. Retroviral Infection of Early Embryos. This alternative involvesinserting the CFTR or mCF gene into a retroviral vector and directlyinfecting mouse embroyos at early stages of development generating achimera [Soriano et al Cell 46: 19 (1986)]. At least some of these willlead to germline transmission.

3. ES Cells and Homologous Recombination. The embryonic stem cellapproach (Capecchi, supra and Capecchi, Trends Genet 5: 70 (1989)]allows the possibility of performing gene transfer and then screeningthe resulting totipotent cells to identify the rare homologousrecombination events. Once identified, these can be used to generatechimeras by injection of mouse blastocysts, and a proportion of theresulting mice will show germline transmission from the recombinantline. There are several ways this could be useful in the generation of amouse model for CF:

a) Inactivation of the mCF gene can be conveniently accomplished bydesigning a DNA fragment which contains sequences from a mCFTR exonflanking a selectable marker such as neo. Homologous recombination willlead to insertion of the neo sequences in the middle of an exon,inactivating mCFTR. The homologous recombination events (usually about 1in 1000) can be recognized from the heterologous ones by DNA analysis ofindividual clones [usually using PCR, Kim et al Nucleic Acids Res. 16:8887 (1988), Joyner et al Nature 338: 153 (1989); Zimmer et al supra, p.150] or by using a negative selection against the heterologous events[such as the use of an HSV TX gene at the end of the construct, followedby the gancyclovir selection, Mansour et al, Nature 336: 348 (1988)].This inactivated mCFTR mouse can then be used to introduce a mutant CFgene or mCF gene containing, for example, the I507 abnormality or anyother desired mutation.

b) It is possible that specific mutants of mCFTR cDNA be created in onestep. For example, one can make a construct containing mCF intron 9sequences at the 5' end, a selectable neo gene in the middle, and intro9+exon 10 (containing the mouse version of the I507 mutation) at the 3'end. A homologous recombination event would lead to the insertion of theneo gene in intron 9 and the replacement of exon 10 with the mutantversion.

c) If the presence of the selectable neo marker in the intron alteredexpresson of the mCF gene, it would be possible to excise it in a secondhomologous recombination step.

d) It is also possible to create mutations in the mouse germline byinjecting oligonucleotides containing the mutation of interest andscreening the resulting cells by PCR.

This embodiment of the invention has considered primarily a mouse modelfor cystic fibrosis. FIG. 4 shows cross-species hybridization not onlyto mouse DNA, but also to bovine, hamster and chicken DNA. Thus, it iscontemplated that an orthologous gene will exist in many other speciesalso. It is thus contemplated that it will be possible to generate otheranimal models using similar technology.

Although preferred embodiments of the invention have been describedherein in detail, it will be understood by those skilled in the art thatvariations may be made thereto without departing from the spirit of theinvention or the scope of the appended claims.

    __________________________________________________________________________    #             SEQUENCE LISTING                                                   - -  - - <160> NUMBER OF SEQ ID NOS: 33                                       - - <210> SEQ ID NO 1                                                        <211> LENGTH: 6130                                                            <212> TYPE: DNA                                                               <213> ORGANISM: Homo sapiens                                                  <220> FEATURE:                                                                <221> NAME/KEY: CDS                                                           <222> LOCATION: (133)..(4572)                                                  - - <400> SEQUENCE: 1                                                         - - aattggaagc aaatgacatc acagcaggtc agagaaaaag ggttgagcgg ca -            #ggcaccca 60                                                                     - - gagtagtagg tctttggcat taggagcttg agcccagacg gccctagcag gg -            #accccagc 120                                                                    - - gcccgagaga cc atg cag agg tcg cct ctg gaa aag - # gcc agc gtt gtc      tcc 171                                                                                       Met Gln Ar - #g Ser Pro Leu Glu Lys Ala Ser Val Val Ser                        1  - #             5     - #             10                    - - aaa ctt ttt ttc agc tgg acc aga cca att tt - #g agg aaa gga tac aga       219                                                                          Lys Leu Phe Phe Ser Trp Thr Arg Pro Ile Le - #u Arg Lys Gly Tyr Arg                15             - #     20             - #     25                          - - cag cgc ctg gaa ttg tca gac ata tac caa at - #c cct tct gtt gat tct       267                                                                          Gln Arg Leu Glu Leu Ser Asp Ile Tyr Gln Il - #e Pro Ser Val Asp Ser            30                 - # 35                 - # 40                 - # 45       - - gct gac aat cta tct gaa aaa ttg gaa aga ga - #a tgg gat aga gag ctg       315                                                                          Ala Asp Asn Leu Ser Glu Lys Leu Glu Arg Gl - #u Trp Asp Arg Glu Leu                            50 - #                 55 - #                 60              - - gct tca aag aaa aat cct aaa ctc att aat gc - #c ctt cgg cga tgt ttt       363                                                                          Ala Ser Lys Lys Asn Pro Lys Leu Ile Asn Al - #a Leu Arg Arg Cys Phe                        65     - #             70     - #             75                  - - ttc tgg aga ttt atg ttc tat gga atc ttt tt - #a tat tta ggg gaa gtc       411                                                                          Phe Trp Arg Phe Met Phe Tyr Gly Ile Phe Le - #u Tyr Leu Gly Glu Val                    80         - #         85         - #         90                      - - acc aaa gca gta cag cct ctc tta ctg gga ag - #a atc ata gct tcc tat       459                                                                          Thr Lys Ala Val Gln Pro Leu Leu Leu Gly Ar - #g Ile Ile Ala Ser Tyr                95             - #    100             - #    105                          - - gac ccg gat aac aag gag gaa cgc tct atc gc - #g att tat cta ggc ata       507                                                                          Asp Pro Asp Asn Lys Glu Glu Arg Ser Ile Al - #a Ile Tyr Leu Gly Ile           110                 1 - #15                 1 - #20                 1 -      #25                                                                              - - ggc tta tgc ctt ctc ttt att gtg agg aca ct - #g ctc cta cac cca        gcc   555                                                                       Gly Leu Cys Leu Leu Phe Ile Val Arg Thr Le - #u Leu Leu His Pro Ala                          130  - #               135  - #               140              - - att ttt ggc ctt cat cac att gga atg cag at - #g aga ata gct atg ttt       603                                                                          Ile Phe Gly Leu His His Ile Gly Met Gln Me - #t Arg Ile Ala Met Phe                       145      - #           150      - #           155                  - - agt ttg att tat aag aag act tta aag ctg tc - #a agc cgt gtt cta gat       651                                                                          Ser Leu Ile Tyr Lys Lys Thr Leu Lys Leu Se - #r Ser Arg Val Leu Asp                   160          - #       165          - #       170                      - - aaa ata agt att gga caa ctt gtt agt ctc ct - #t tcc aac aac ctg aac       699                                                                          Lys Ile Ser Ile Gly Gln Leu Val Ser Leu Le - #u Ser Asn Asn Leu Asn               175              - #   180              - #   185                          - - aaa ttt gat gaa gga ctt gca ttg gca cat tt - #c gtg tgg atc gct cct       747                                                                          Lys Phe Asp Glu Gly Leu Ala Leu Ala His Ph - #e Val Trp Ile Ala Pro           190                 1 - #95                 2 - #00                 2 -      #05                                                                              - - ttg caa gtg gca ctc ctc atg ggg cta atc tg - #g gag ttg tta cag        gcg   795                                                                       Leu Gln Val Ala Leu Leu Met Gly Leu Ile Tr - #p Glu Leu Leu Gln Ala                          210  - #               215  - #               220              - - tct gcc ttc tgt gga ctt ggt ttc ctg ata gt - #c ctt gcc ctt ttt cag       843                                                                          Ser Ala Phe Cys Gly Leu Gly Phe Leu Ile Va - #l Leu Ala Leu Phe Gln                       225      - #           230      - #           235                  - - gct ggg cta ggg aga atg atg atg aag tac ag - #a gat cag aga gct ggg       891                                                                          Ala Gly Leu Gly Arg Met Met Met Lys Tyr Ar - #g Asp Gln Arg Ala Gly                   240          - #       245          - #       250                      - - aag atc agt gaa aga ctt gtg att acc tca ga - #a atg att gaa aat atc       939                                                                          Lys Ile Ser Glu Arg Leu Val Ile Thr Ser Gl - #u Met Ile Glu Asn Ile               255              - #   260              - #   265                          - - caa tct gtt aag gca tac tgc tgg gaa gaa gc - #a atg gaa aaa atg att       987                                                                          Gln Ser Val Lys Ala Tyr Cys Trp Glu Glu Al - #a Met Glu Lys Met Ile           270                 2 - #75                 2 - #80                 2 -      #85                                                                              - - gaa aac tta aga caa aca gaa ctg aaa ctg ac - #t cgg aag gca gcc        tat   1035                                                                      Glu Asn Leu Arg Gln Thr Glu Leu Lys Leu Th - #r Arg Lys Ala Ala Tyr                          290  - #               295  - #               300              - - gtg aga tac ttc aat agc tca gcc ttc ttc tt - #c tca ggg ttc ttt gtg       1083                                                                         Val Arg Tyr Phe Asn Ser Ser Ala Phe Phe Ph - #e Ser Gly Phe Phe Val                       305      - #           310      - #           315                  - - gtg ttt tta tct gtg ctt ccc tat gca cta at - #c aaa gga atc atc ctc       1131                                                                         Val Phe Leu Ser Val Leu Pro Tyr Ala Leu Il - #e Lys Gly Ile Ile Leu                   320          - #       325          - #       330                      - - cgg aaa ata ttc acc acc atc tca ttc tgc at - #t gtt ctg cgc atg gcg       1179                                                                         Arg Lys Ile Phe Thr Thr Ile Ser Phe Cys Il - #e Val Leu Arg Met Ala               335              - #   340              - #   345                          - - gtc act cgg caa ttt ccc tgg gct gta caa ac - #a tgg tat gac tct ctt       1227                                                                         Val Thr Arg Gln Phe Pro Trp Ala Val Gln Th - #r Trp Tyr Asp Ser Leu           350                 3 - #55                 3 - #60                 3 -      #65                                                                              - - gga gca ata aac aaa ata cag gat ttc tta ca - #a aag caa gaa tat        aag   1275                                                                      Gly Ala Ile Asn Lys Ile Gln Asp Phe Leu Gl - #n Lys Gln Glu Tyr Lys                          370  - #               375  - #               380              - - aca ttg gaa tat aac tta acg act aca gaa gt - #a gtg atg gag aat gta       1323                                                                         Thr Leu Glu Tyr Asn Leu Thr Thr Thr Glu Va - #l Val Met Glu Asn Val                       385      - #           390      - #           395                  - - aca gcc ttc tgg gag gag gga ttt ggg gaa tt - #a ttt gag aaa gca aaa       1371                                                                         Thr Ala Phe Trp Glu Glu Gly Phe Gly Glu Le - #u Phe Glu Lys Ala Lys                   400          - #       405          - #       410                      - - caa aac aat aac aat aga aaa act tct aat gg - #t gat gac agc ctc ttc       1419                                                                         Gln Asn Asn Asn Asn Arg Lys Thr Ser Asn Gl - #y Asp Asp Ser Leu Phe               415              - #   420              - #   425                          - - ttc agt aat ttc tca ctt ctt ggt act cct gt - #c ctg aaa gat att aat       1467                                                                         Phe Ser Asn Phe Ser Leu Leu Gly Thr Pro Va - #l Leu Lys Asp Ile Asn           430                 4 - #35                 4 - #40                 4 -      #45                                                                              - - ttc aag ata gaa aga gga cag ttg ttg gcg gt - #t gct gga tcc act        gga   1515                                                                      Phe Lys Ile Glu Arg Gly Gln Leu Leu Ala Va - #l Ala Gly Ser Thr Gly                          450  - #               455  - #               460              - - gca ggc aag act tca ctt cta atg atg att at - #g gga gaa ctg gag cct       1563                                                                         Ala Gly Lys Thr Ser Leu Leu Met Met Ile Me - #t Gly Glu Leu Glu Pro                       465      - #           470      - #           475                  - - tca gag ggt aaa att aag cac agt gga aga at - #t tca ttc tgt tct cag       1611                                                                         Ser Glu Gly Lys Ile Lys His Ser Gly Arg Il - #e Ser Phe Cys Ser Gln                   480          - #       485          - #       490                      - - ttt tcc tgg att atg cct ggc acc att aaa ga - #a aat atc atc ttt ggt       1659                                                                         Phe Ser Trp Ile Met Pro Gly Thr Ile Lys Gl - #u Asn Ile Ile Phe Gly               495              - #   500              - #   505                          - - gtt tcc tat gat gaa tat aga tac aga agc gt - #c atc aaa gca tgc caa       1707                                                                         Val Ser Tyr Asp Glu Tyr Arg Tyr Arg Ser Va - #l Ile Lys Ala Cys Gln           510                 5 - #15                 5 - #20                 5 -      #25                                                                              - - cta gaa gag gac atc tcc aag ttt gca gag aa - #a gac aat ata gtt        ctt   1755                                                                      Leu Glu Glu Asp Ile Ser Lys Phe Ala Glu Ly - #s Asp Asn Ile Val Leu                          530  - #               535  - #               540              - - gga gaa ggt gga atc aca ctg agt gga ggt ca - #a cga gca aga att tct       1803                                                                         Gly Glu Gly Gly Ile Thr Leu Ser Gly Gly Gl - #n Arg Ala Arg Ile Ser                       545      - #           550      - #           555                  - - tta gca aga gca gta tac aaa gat gct gat tt - #g tat tta tta gac tct       1851                                                                         Leu Ala Arg Ala Val Tyr Lys Asp Ala Asp Le - #u Tyr Leu Leu Asp Ser                   560          - #       565          - #       570                      - - cct ttt gga tac cta gat gtt tta aca gaa aa - #a gaa ata ttt gaa agc       1899                                                                         Pro Phe Gly Tyr Leu Asp Val Leu Thr Glu Ly - #s Glu Ile Phe Glu Ser               575              - #   580              - #   585                          - - tgt gtc tgt aaa ctg atg gct aac aaa act ag - #g att ttg gtc act tct       1947                                                                         Cys Val Cys Lys Leu Met Ala Asn Lys Thr Ar - #g Ile Leu Val Thr Ser           590                 5 - #95                 6 - #00                 6 -      #05                                                                              - - aaa atg gaa cat tta aag aaa gct gac aaa at - #a tta att ttg cat        gaa   1995                                                                      Lys Met Glu His Leu Lys Lys Ala Asp Lys Il - #e Leu Ile Leu His Glu                          610  - #               615  - #               620              - - ggt agc agc tat ttt tat ggg aca ttt tca ga - #a ctc caa aat cta cag       2043                                                                         Gly Ser Ser Tyr Phe Tyr Gly Thr Phe Ser Gl - #u Leu Gln Asn Leu Gln                       625      - #           630      - #           635                  - - cca gac ttt agc tca aaa ctc atg gga tgt ga - #t tct ttc gac caa ttt       2091                                                                         Pro Asp Phe Ser Ser Lys Leu Met Gly Cys As - #p Ser Phe Asp Gln Phe                   640          - #       645          - #       650                      - - agt gca gaa aga aga aat tca atc cta act ga - #g acc tta cac cgt ttc       2139                                                                         Ser Ala Glu Arg Arg Asn Ser Ile Leu Thr Gl - #u Thr Leu His Arg Phe               655              - #   660              - #   665                          - - tca tta gaa gga gat gct cct gtc tcc tgg ac - #a gaa aca aaa aaa caa       2187                                                                         Ser Leu Glu Gly Asp Ala Pro Val Ser Trp Th - #r Glu Thr Lys Lys Gln           670                 6 - #75                 6 - #80                 6 -      #85                                                                              - - tct ttt aaa cag act gga gag ttt ggg gaa aa - #a agg aag aat tct        att   2235                                                                      Ser Phe Lys Gln Thr Gly Glu Phe Gly Glu Ly - #s Arg Lys Asn Ser Ile                          690  - #               695  - #               700              - - ctc aat cca atc aac tct ata cga aaa ttt tc - #c att gtg caa aag act       2283                                                                         Leu Asn Pro Ile Asn Ser Ile Arg Lys Phe Se - #r Ile Val Gln Lys Thr                       705      - #           710      - #           715                  - - ccc tta caa atg aat ggc atc gaa gag gat tc - #t gat gag cct tta gag       2331                                                                         Pro Leu Gln Met Asn Gly Ile Glu Glu Asp Se - #r Asp Glu Pro Leu Glu                   720          - #       725          - #       730                      - - aga agg ctg tcc tta gta cca gat tct gag ca - #g gga gag gcg ata ctg       2379                                                                         Arg Arg Leu Ser Leu Val Pro Asp Ser Glu Gl - #n Gly Glu Ala Ile Leu               735              - #   740              - #   745                          - - cct cgc atc agc gtg atc agc act ggc ccc ac - #g cct cag gca cga agg       2427                                                                         Pro Arg Ile Ser Val Ile Ser Thr Gly Pro Th - #r Pro Gln Ala Arg Arg           750                 7 - #55                 7 - #60                 7 -      #65                                                                              - - agg cag tct gtc ctg aac ctg atg aca cac tc - #a gtt aac caa ggt        cag   2475                                                                      Arg Gln Ser Val Leu Asn Leu Met Thr His Se - #r Val Asn Gln Gly Gln                          770  - #               775  - #               780              - - aac att cac cga aag aca aca gca tcc aca cg - #a aaa gtg tca ctg gcc       2523                                                                         Asn Ile His Arg Lys Thr Thr Ala Ser Thr Ar - #g Lys Val Ser Leu Ala                       785      - #           790      - #           795                  - - cct cag gca aac ttg act gaa ctg gat ata ta - #t tca aga agg tta tct       2571                                                                         Pro Gln Ala Asn Leu Thr Glu Leu Asp Ile Ty - #r Ser Arg Arg Leu Ser                   800          - #       805          - #       810                      - - caa gaa act ggc ttg gaa ata agt gaa gaa at - #t aac gaa gaa gac tta       2619                                                                         Gln Glu Thr Gly Leu Glu Ile Ser Glu Glu Il - #e Asn Glu Glu Asp Leu               815              - #   820              - #   825                          - - aag gag tgc ttt ttt gat gat atg gag agc at - #a cca gca gtg act aca       2667                                                                         Lys Glu Cys Phe Phe Asp Asp Met Glu Ser Il - #e Pro Ala Val Thr Thr           830                 8 - #35                 8 - #40                 8 -      #45                                                                              - - tgg aac aca tac ctt cga tat att act gtc ca - #c aag agc tta att        ttt   2715                                                                      Trp Asn Thr Tyr Leu Arg Tyr Ile Thr Val Hi - #s Lys Ser Leu Ile Phe                          850  - #               855  - #               860              - - gtg cta att tgg tgc tta gta att ttt ctg gc - #a gag gtg gct gct tct       2763                                                                         Val Leu Ile Trp Cys Leu Val Ile Phe Leu Al - #a Glu Val Ala Ala Ser                       865      - #           870      - #           875                  - - ttg gtt gtg ctg tgg ctc ctt gga aac act cc - #t ctt caa gac aaa ggg       2811                                                                         Leu Val Val Leu Trp Leu Leu Gly Asn Thr Pr - #o Leu Gln Asp Lys Gly                   880          - #       885          - #       890                      - - aat agt act cat agt aga aat aac agc tat gc - #a gtg att atc acc agc       2859                                                                         Asn Ser Thr His Ser Arg Asn Asn Ser Tyr Al - #a Val Ile Ile Thr Ser               895              - #   900              - #   905                          - - acc agt tcg tat tat gtg ttt tac att tac gt - #g gga gta gcc gac act       2907                                                                         Thr Ser Ser Tyr Tyr Val Phe Tyr Ile Tyr Va - #l Gly Val Ala Asp Thr           910                 9 - #15                 9 - #20                 9 -      #25                                                                              - - ttg ctt gct atg gga ttc ttc aga ggt cta cc - #a ctg gtg cat act        cta   2955                                                                      Leu Leu Ala Met Gly Phe Phe Arg Gly Leu Pr - #o Leu Val His Thr Leu                          930  - #               935  - #               940              - - atc aca gtg tcg aaa att tta cac cac aaa at - #g tta cat tct gtt ctt       3003                                                                         Ile Thr Val Ser Lys Ile Leu His His Lys Me - #t Leu His Ser Val Leu                       945      - #           950      - #           955                  - - caa gca cct atg tca acc ctc aac acg ttg aa - #a gca ggt ggg att ctt       3051                                                                         Gln Ala Pro Met Ser Thr Leu Asn Thr Leu Ly - #s Ala Gly Gly Ile Leu                   960          - #       965          - #       970                      - - aat aga ttc tcc aaa gat ata gca att ttg ga - #t gac ctt ctg cct cct       3099                                                                         Asn Arg Phe Ser Lys Asp Ile Ala Ile Leu As - #p Asp Leu Leu Pro Pro               975              - #   980              - #   985                          - - acc ata ttt gac ttc atc cag ttg tta tta at - #t gtg att gga gct ata       3147                                                                         Thr Ile Phe Asp Phe Ile Gln Leu Leu Leu Il - #e Val Ile Gly Ala Ile           990                 9 - #95                1000 - #               1005         - - gca gtt gtc gca gtt tta caa ccc tac atc tt - #t gtt gca aca gtg cca       3195                                                                         Ala Val Val Ala Val Leu Gln Pro Tyr Ile Ph - #e Val Ala Thr Val Pro                          1010  - #              1015   - #             1020              - - gtg ata gtg gct ttt att atg ttg aga gca ta - #t ttc ctc caa acc tca       3243                                                                         Val Ile Val Ala Phe Ile Met Leu Arg Ala Ty - #r Phe Leu Gln Thr Ser                      1025      - #          1030       - #         1035                  - - cag caa ctc aaa caa ctg gaa tct gaa ggc ag - #g agt cca att ttc act       3291                                                                         Gln Gln Leu Lys Gln Leu Glu Ser Glu Gly Ar - #g Ser Pro Ile Phe Thr                  1040          - #      1045           - #     1050                      - - cat ctt gtt aca agc tta aaa gga cta tgg ac - #a ctt cgt gcc ttc gga       3339                                                                         His Leu Val Thr Ser Leu Lys Gly Leu Trp Th - #r Leu Arg Ala Phe Gly              1055              - #  1060               - # 1065                          - - cgg cag cct tac ttt gaa act ctg ttc cac aa - #a gct ctg aat tta cat       3387                                                                         Arg Gln Pro Tyr Phe Glu Thr Leu Phe His Ly - #s Ala Leu Asn Leu His           1070               1075 - #               1080  - #              1085          - - act gcc aac tgg ttc ttg tac ctg tca aca ct - #g cgc tgg ttc caa atg       3435                                                                         Thr Ala Asn Trp Phe Leu Tyr Leu Ser Thr Le - #u Arg Trp Phe Gln Met                          1090  - #              1095   - #             1100              - - aga ata gaa atg att ttt gtc atc ttc ttc at - #t gct gtt acc ttc att       3483                                                                         Arg Ile Glu Met Ile Phe Val Ile Phe Phe Il - #e Ala Val Thr Phe Ile                      1105      - #          1110       - #         1115                  - - tcc att tta aca aca gga gaa gga gaa gga ag - #a gtt ggt att atc ctg       3531                                                                         Ser Ile Leu Thr Thr Gly Glu Gly Glu Gly Ar - #g Val Gly Ile Ile Leu                  1120          - #      1125           - #     1130                      - - act tta gcc atg aat atc atg agt aca ttg ca - #g tgg gct gta aac tcc       3579                                                                         Thr Leu Ala Met Asn Ile Met Ser Thr Leu Gl - #n Trp Ala Val Asn Ser              1135              - #  1140               - # 1145                          - - agc ata gat gtg gat agc ttg atg cga tct gt - #g agc cga gtc ttt aag       3627                                                                         Ser Ile Asp Val Asp Ser Leu Met Arg Ser Va - #l Ser Arg Val Phe Lys           1150               1155 - #               1160  - #              1165          - - ttc att gac atg cca aca gaa ggt aaa cct ac - #c aag tca acc aaa cca       3675                                                                         Phe Ile Asp Met Pro Thr Glu Gly Lys Pro Th - #r Lys Ser Thr Lys Pro                          1170  - #              1175   - #             1180              - - tac aag aat ggc caa ctc tcg aaa gtt atg at - #t att gag aat tca cac       3723                                                                         Tyr Lys Asn Gly Gln Leu Ser Lys Val Met Il - #e Ile Glu Asn Ser His                      1185      - #          1190       - #         1195                  - - gtg aag aaa gat gac atc tgg ccc tca ggg gg - #c caa atg act gtc aaa       3771                                                                         Val Lys Lys Asp Asp Ile Trp Pro Ser Gly Gl - #y Gln Met Thr Val Lys                  1200          - #      1205           - #     1210                      - - gat ctc aca gca aaa tac aca gaa ggt gga aa - #t gcc ata tta gag aac       3819                                                                         Asp Leu Thr Ala Lys Tyr Thr Glu Gly Gly As - #n Ala Ile Leu Glu Asn              1215              - #  1220               - # 1225                          - - att tcc ttc tca ata agt cct ggc cag agg gt - #g ggc ctc ttg gga aga       3867                                                                         Ile Ser Phe Ser Ile Ser Pro Gly Gln Arg Va - #l Gly Leu Leu Gly Arg           1230               1235 - #               1240  - #              1245          - - act gga tca ggg aag agt act ttg tta tca gc - #t ttt ttg aga cta ctg       3915                                                                         Thr Gly Ser Gly Lys Ser Thr Leu Leu Ser Al - #a Phe Leu Arg Leu Leu                          1250  - #              1255   - #             1260              - - aac act gaa gga gaa atc cag atc gat ggt gt - #g tct tgg gat tca ata       3963                                                                         Asn Thr Glu Gly Glu Ile Gln Ile Asp Gly Va - #l Ser Trp Asp Ser Ile                      1265      - #          1270       - #         1275                  - - act ttg caa cag tgg agg aaa gcc ttt gga gt - #g ata cca cag aaa gta       4011                                                                         Thr Leu Gln Gln Trp Arg Lys Ala Phe Gly Va - #l Ile Pro Gln Lys Val                  1280          - #      1285           - #     1290                      - - ttt att ttt tct gga aca ttt aga aaa aac tt - #g gat ccc tat gaa cag       4059                                                                         Phe Ile Phe Ser Gly Thr Phe Arg Lys Asn Le - #u Asp Pro Tyr Glu Gln              1295              - #  1300               - # 1305                          - - tgg agt gat caa gaa ata tgg aaa gtt gca ga - #t gag gtt ggg ctc aga       4107                                                                         Trp Ser Asp Gln Glu Ile Trp Lys Val Ala As - #p Glu Val Gly Leu Arg           1310               1315 - #               1320  - #              1325          - - tct gtg ata gaa cag ttt cct ggg aag ctt ga - #c ttt gtc ctt gtg gat       4155                                                                         Ser Val Ile Glu Gln Phe Pro Gly Lys Leu As - #p Phe Val Leu Val Asp                          1330  - #              1335   - #             1340              - - ggg ggc tgt gtc cta agc cat ggc cac aag ca - #g ttg atg tgc ttg gct       4203                                                                         Gly Gly Cys Val Leu Ser His Gly His Lys Gl - #n Leu Met Cys Leu Ala                      1345      - #          1350       - #         1355                  - - aga tct gtt ctc agt aag gcg aag atc ttg ct - #g ctt gat gaa ccc agt       4251                                                                         Arg Ser Val Leu Ser Lys Ala Lys Ile Leu Le - #u Leu Asp Glu Pro Ser                  1360          - #      1365           - #     1370                      - - gct cat ttg gat cca gta aca tac caa ata at - #t aga aga act cta aaa       4299                                                                         Ala His Leu Asp Pro Val Thr Tyr Gln Ile Il - #e Arg Arg Thr Leu Lys              1375              - #  1380               - # 1385                          - - caa gca ttt gct gat tgc aca gta att ctc tg - #t gaa cac agg ata gaa       4347                                                                         Gln Ala Phe Ala Asp Cys Thr Val Ile Leu Cy - #s Glu His Arg Ile Glu           1390               1395 - #               1400  - #              1405          - - gca atg ctg gaa tgc caa caa ttt ttg gtc at - #a gaa gag aac aaa gtg       4395                                                                         Ala Met Leu Glu Cys Gln Gln Phe Leu Val Il - #e Glu Glu Asn Lys Val                          1410  - #              1415   - #             1420              - - cgg cag tac gat tcc atc cag aaa ctg ctg aa - #c gag agg agc ctc ttc       4443                                                                         Arg Gln Tyr Asp Ser Ile Gln Lys Leu Leu As - #n Glu Arg Ser Leu Phe                      1425      - #          1430       - #         1435                  - - cgg caa gcc atc agc ccc tcc gac agg gtg aa - #g ctc ttt ccc cac cgg       4491                                                                         Arg Gln Ala Ile Ser Pro Ser Asp Arg Val Ly - #s Leu Phe Pro His Arg                  1440          - #      1445           - #     1450                      - - aac tca agc aag tgc aag tct aag ccc cag at - #t gct gct ctg aaa gag       4539                                                                         Asn Ser Ser Lys Cys Lys Ser Lys Pro Gln Il - #e Ala Ala Leu Lys Glu              1455              - #  1460               - # 1465                          - - gag aca gaa gaa gag gtg caa gat aca agg ct - #t tagagagcag cataaatgt    t 4592                                                                         Glu Thr Glu Glu Glu Val Gln Asp Thr Arg Le - #u                               1470               1475 - #               1480                                 - - gacatgggac atttgctcat ggaattggag ctcgtgggac agtcacctca tg -             #gaattgga 4652                                                                   - - gctcgtggaa cagttacctc tgcctcagaa aacaaggatg aattaagttt tt -            #ttttaaaa 4712                                                                   - - aagaaacatt tggtaagggg aattgaggac actgatatgg gtcttgataa at -            #ggcttcct 4772                                                                   - - ggcaatagtc aaattgtgtg aaaggtactt caaatccttg aagatttacc ac -            #ttgtgttt 4832                                                                   - - tgcaagccag attttcctga aaacccttgc catgtgctag taattggaaa gg -            #cagctcta 4892                                                                   - - aatgtcaatc agcctagttg atcagcttat tgtctagtga aactcgttaa tt -            #tgtagtgt 4952                                                                   - - tggagaagaa ctgaaatcat acttcttagg gttatgatta agtaatgata ac -            #tggaaact 5012                                                                   - - tcagcggttt atataagctt gtattccttt ttctctcctc tccccatgat gt -            #ttagaaac 5072                                                                   - - acaactatat tgtttgctaa gcattccaac tatctcattt ccaagcaagt at -            #tagaatac 5132                                                                   - - cacaggaacc acaagactgc acatcaaaat atgccccatt caacatctag tg -            #agcagtca 5192                                                                   - - ggaaagagaa cttccagatc ctggaaatca gggttagtat tgtccaggtc ta -            #ccaaaaat 5252                                                                   - - ctcaatattt cagataatca caatacatcc cttacctggg aaagggctgt ta -            #taatcttt 5312                                                                   - - cacaggggac aggatggttc ccttgatgaa gaagttgata tgccttttcc ca -            #actccaga 5372                                                                   - - aagtgacaag ctcacagacc tttgaactag agtttagctg gaaaagtatg tt -            #agtgcaaa 5432                                                                   - - ttgtcacagg acagcccttc tttccacaga agctccaggt agagggtgtg ta -            #agtagata 5492                                                                   - - ggccatgggc actgtgggta gacacacatg aagtccaagc atttagatgt at -            #aggttgat 5552                                                                   - - ggtggtatgt tttcaggcta gatgtatgta cttcatgctg tctacactaa ga -            #gagaatga 5612                                                                   - - gagacacact gaagaagcac caatcatgaa ttagttttat atgcttctgt tt -            #tataattt 5672                                                                   - - tgtgaagcaa aattttttct ctaggaaata tttattttaa taatgtttca aa -            #catatatt 5732                                                                   - - acaatgctgt attttaaaag aatgattatg aattacattt gtataaaata at -            #ttttatat 5792                                                                   - - ttgaaatatt gactttttat ggcactagta tttttatgaa atattatgtt aa -            #aactggga 5852                                                                   - - caggggagaa cctagggtga tattaaccag gggccatgaa tcaccttttg gt -            #ctggaggg 5912                                                                   - - aagccttggg gctgatcgag ttgttgccca cagctgtatg attcccagcc ag -            #acacagcc 5972                                                                   - - tcttagatgc agttctgaag aagatggtac caccagtctg actgtttcca tc -            #aagggtac 6032                                                                   - - actgccttct caactccaaa ctgactctta agaagactgc attatattta tt -            #actgtaag 6092                                                                   - - aaaatatcac ttgtcaataa aatccataca tttgtgta      - #                      - # 6130                                                                       - -  - - <210> SEQ ID NO 2                                                   <211> LENGTH: 1480                                                            <212> TYPE: PRT                                                               <213> ORGANISM: Homo sapiens                                                   - - <400> SEQUENCE: 2                                                         - - Met Gln Arg Ser Pro Leu Glu Lys Ala Ser Va - #l Val Ser Lys Leu Phe        1               5 - #                 10 - #                 15              - - Phe Ser Trp Thr Arg Pro Ile Leu Arg Lys Gl - #y Tyr Arg Gln Arg Leu                   20     - #             25     - #             30                  - - Glu Leu Ser Asp Ile Tyr Gln Ile Pro Ser Va - #l Asp Ser Ala Asp Asn               35         - #         40         - #         45                      - - Leu Ser Glu Lys Leu Glu Arg Glu Trp Asp Ar - #g Glu Leu Ala Ser Lys           50             - #     55             - #     60                          - - Lys Asn Pro Lys Leu Ile Asn Ala Leu Arg Ar - #g Cys Phe Phe Trp Arg       65                 - # 70                 - # 75                 - # 80       - - Phe Met Phe Tyr Gly Ile Phe Leu Tyr Leu Gl - #y Glu Val Thr Lys Ala                       85 - #                 90 - #                 95              - - Val Gln Pro Leu Leu Leu Gly Arg Ile Ile Al - #a Ser Tyr Asp Pro Asp                  100      - #           105      - #           110                  - - Asn Lys Glu Glu Arg Ser Ile Ala Ile Tyr Le - #u Gly Ile Gly Leu Cys              115          - #       120          - #       125                      - - Leu Leu Phe Ile Val Arg Thr Leu Leu Leu Hi - #s Pro Ala Ile Phe Gly          130              - #   135              - #   140                          - - Leu His His Ile Gly Met Gln Met Arg Ile Al - #a Met Phe Ser Leu Ile      145                 1 - #50                 1 - #55                 1 -      #60                                                                              - - Tyr Lys Lys Thr Leu Lys Leu Ser Ser Arg Va - #l Leu Asp Lys Ile        Ser                                                                                             165  - #               170  - #               175             - - Ile Gly Gln Leu Val Ser Leu Leu Ser Asn As - #n Leu Asn Lys Phe Asp                  180      - #           185      - #           190                  - - Glu Gly Leu Ala Leu Ala His Phe Val Trp Il - #e Ala Pro Leu Gln Val              195          - #       200          - #       205                      - - Ala Leu Leu Met Gly Leu Ile Trp Glu Leu Le - #u Gln Ala Ser Ala Phe          210              - #   215              - #   220                          - - Cys Gly Leu Gly Phe Leu Ile Val Leu Ala Le - #u Phe Gln Ala Gly Leu      225                 2 - #30                 2 - #35                 2 -      #40                                                                              - - Gly Arg Met Met Met Lys Tyr Arg Asp Gln Ar - #g Ala Gly Lys Ile        Ser                                                                                             245  - #               250  - #               255             - - Glu Arg Leu Val Ile Thr Ser Glu Met Ile Gl - #u Asn Ile Gln Ser Val                  260      - #           265      - #           270                  - - Lys Ala Tyr Cys Trp Glu Glu Ala Met Glu Ly - #s Met Ile Glu Asn Leu              275          - #       280          - #       285                      - - Arg Gln Thr Glu Leu Lys Leu Thr Arg Lys Al - #a Ala Tyr Val Arg Tyr          290              - #   295              - #   300                          - - Phe Asn Ser Ser Ala Phe Phe Phe Ser Gly Ph - #e Phe Val Val Phe Leu      305                 3 - #10                 3 - #15                 3 -      #20                                                                              - - Ser Val Leu Pro Tyr Ala Leu Ile Lys Gly Il - #e Ile Leu Arg Lys        Ile                                                                                             325  - #               330  - #               335             - - Phe Thr Thr Ile Ser Phe Cys Ile Val Leu Ar - #g Met Ala Val Thr Arg                  340      - #           345      - #           350                  - - Gln Phe Pro Trp Ala Val Gln Thr Trp Tyr As - #p Ser Leu Gly Ala Ile              355          - #       360          - #       365                      - - Asn Lys Ile Gln Asp Phe Leu Gln Lys Gln Gl - #u Tyr Lys Thr Leu Glu          370              - #   375              - #   380                          - - Tyr Asn Leu Thr Thr Thr Glu Val Val Met Gl - #u Asn Val Thr Ala Phe      385                 3 - #90                 3 - #95                 4 -      #00                                                                              - - Trp Glu Glu Gly Phe Gly Glu Leu Phe Glu Ly - #s Ala Lys Gln Asn        Asn                                                                                             405  - #               410  - #               415             - - Asn Asn Arg Lys Thr Ser Asn Gly Asp Asp Se - #r Leu Phe Phe Ser Asn                  420      - #           425      - #           430                  - - Phe Ser Leu Leu Gly Thr Pro Val Leu Lys As - #p Ile Asn Phe Lys Ile              435          - #       440          - #       445                      - - Glu Arg Gly Gln Leu Leu Ala Val Ala Gly Se - #r Thr Gly Ala Gly Lys          450              - #   455              - #   460                          - - Thr Ser Leu Leu Met Met Ile Met Gly Glu Le - #u Glu Pro Ser Glu Gly      465                 4 - #70                 4 - #75                 4 -      #80                                                                              - - Lys Ile Lys His Ser Gly Arg Ile Ser Phe Cy - #s Ser Gln Phe Ser        Trp                                                                                             485  - #               490  - #               495             - - Ile Met Pro Gly Thr Ile Lys Glu Asn Ile Il - #e Phe Gly Val Ser Tyr                  500      - #           505      - #           510                  - - Asp Glu Tyr Arg Tyr Arg Ser Val Ile Lys Al - #a Cys Gln Leu Glu Glu              515          - #       520          - #       525                      - - Asp Ile Ser Lys Phe Ala Glu Lys Asp Asn Il - #e Val Leu Gly Glu Gly          530              - #   535              - #   540                          - - Gly Ile Thr Leu Ser Gly Gly Gln Arg Ala Ar - #g Ile Ser Leu Ala Arg      545                 5 - #50                 5 - #55                 5 -      #60                                                                              - - Ala Val Tyr Lys Asp Ala Asp Leu Tyr Leu Le - #u Asp Ser Pro Phe        Gly                                                                                             565  - #               570  - #               575             - - Tyr Leu Asp Val Leu Thr Glu Lys Glu Ile Ph - #e Glu Ser Cys Val Cys                  580      - #           585      - #           590                  - - Lys Leu Met Ala Asn Lys Thr Arg Ile Leu Va - #l Thr Ser Lys Met Glu              595          - #       600          - #       605                      - - His Leu Lys Lys Ala Asp Lys Ile Leu Ile Le - #u His Glu Gly Ser Ser          610              - #   615              - #   620                          - - Tyr Phe Tyr Gly Thr Phe Ser Glu Leu Gln As - #n Leu Gln Pro Asp Phe      625                 6 - #30                 6 - #35                 6 -      #40                                                                              - - Ser Ser Lys Leu Met Gly Cys Asp Ser Phe As - #p Gln Phe Ser Ala        Glu                                                                                             645  - #               650  - #               655             - - Arg Arg Asn Ser Ile Leu Thr Glu Thr Leu Hi - #s Arg Phe Ser Leu Glu                  660      - #           665      - #           670                  - - Gly Asp Ala Pro Val Ser Trp Thr Glu Thr Ly - #s Lys Gln Ser Phe Lys              675          - #       680          - #       685                      - - Gln Thr Gly Glu Phe Gly Glu Lys Arg Lys As - #n Ser Ile Leu Asn Pro          690              - #   695              - #   700                          - - Ile Asn Ser Ile Arg Lys Phe Ser Ile Val Gl - #n Lys Thr Pro Leu Gln      705                 7 - #10                 7 - #15                 7 -      #20                                                                              - - Met Asn Gly Ile Glu Glu Asp Ser Asp Glu Pr - #o Leu Glu Arg Arg        Leu                                                                                             725  - #               730  - #               735             - - Ser Leu Val Pro Asp Ser Glu Gln Gly Glu Al - #a Ile Leu Pro Arg Ile                  740      - #           745      - #           750                  - - Ser Val Ile Ser Thr Gly Pro Thr Pro Gln Al - #a Arg Arg Arg Gln Ser              755          - #       760          - #       765                      - - Val Leu Asn Leu Met Thr His Ser Val Asn Gl - #n Gly Gln Asn Ile His          770              - #   775              - #   780                          - - Arg Lys Thr Thr Ala Ser Thr Arg Lys Val Se - #r Leu Ala Pro Gln Ala      785                 7 - #90                 7 - #95                 8 -      #00                                                                              - - Asn Leu Thr Glu Leu Asp Ile Tyr Ser Arg Ar - #g Leu Ser Gln Glu        Thr                                                                                             805  - #               810  - #               815             - - Gly Leu Glu Ile Ser Glu Glu Ile Asn Glu Gl - #u Asp Leu Lys Glu Cys                  820      - #           825      - #           830                  - - Phe Phe Asp Asp Met Glu Ser Ile Pro Ala Va - #l Thr Thr Trp Asn Thr              835          - #       840          - #       845                      - - Tyr Leu Arg Tyr Ile Thr Val His Lys Ser Le - #u Ile Phe Val Leu Ile          850              - #   855              - #   860                          - - Trp Cys Leu Val Ile Phe Leu Ala Glu Val Al - #a Ala Ser Leu Val Val      865                 8 - #70                 8 - #75                 8 -      #80                                                                              - - Leu Trp Leu Leu Gly Asn Thr Pro Leu Gln As - #p Lys Gly Asn Ser        Thr                                                                                             885  - #               890  - #               895             - - His Ser Arg Asn Asn Ser Tyr Ala Val Ile Il - #e Thr Ser Thr Ser Ser                  900      - #           905      - #           910                  - - Tyr Tyr Val Phe Tyr Ile Tyr Val Gly Val Al - #a Asp Thr Leu Leu Ala              915          - #       920          - #       925                      - - Met Gly Phe Phe Arg Gly Leu Pro Leu Val Hi - #s Thr Leu Ile Thr Val          930              - #   935              - #   940                          - - Ser Lys Ile Leu His His Lys Met Leu His Se - #r Val Leu Gln Ala Pro      945                 9 - #50                 9 - #55                 9 -      #60                                                                              - - Met Ser Thr Leu Asn Thr Leu Lys Ala Gly Gl - #y Ile Leu Asn Arg        Phe                                                                                             965  - #               970  - #               975             - - Ser Lys Asp Ile Ala Ile Leu Asp Asp Leu Le - #u Pro Pro Thr Ile Phe                  980      - #           985      - #           990                  - - Asp Phe Ile Gln Leu Leu Leu Ile Val Ile Gl - #y Ala Ile Ala Val Val              995          - #      1000           - #     1005                      - - Ala Val Leu Gln Pro Tyr Ile Phe Val Ala Th - #r Val Pro Val Ile Val         1010              - #  1015               - # 1020                          - - Ala Phe Ile Met Leu Arg Ala Tyr Phe Leu Gl - #n Thr Ser Gln Gln Leu      1025                1030 - #                1035 - #               1040        - - Lys Gln Leu Glu Ser Glu Gly Arg Ser Pro Il - #e Phe Thr His Leu Val                     1045  - #              1050   - #             1055              - - Thr Ser Leu Lys Gly Leu Trp Thr Leu Arg Al - #a Phe Gly Arg Gln Pro                 1060      - #          1065       - #          1070                 - - Tyr Phe Glu Thr Leu Phe His Lys Ala Leu As - #n Leu His Thr Ala Asn             1075          - #      1080           - #     1085                      - - Trp Phe Leu Tyr Leu Ser Thr Leu Arg Trp Ph - #e Gln Met Arg Ile Glu         1090              - #  1095               - # 1100                          - - Met Ile Phe Val Ile Phe Phe Ile Ala Val Th - #r Phe Ile Ser Ile Leu      1105                1110 - #                1115 - #               1120        - - Thr Thr Gly Glu Gly Glu Gly Arg Val Gly Il - #e Ile Leu Thr Leu Ala                     1125  - #              1130   - #             1135              - - Met Asn Ile Met Ser Thr Leu Gln Trp Ala Va - #l Asn Ser Ser Ile Asp                 1140      - #          1145       - #          1150                 - - Val Asp Ser Leu Met Arg Ser Val Ser Arg Va - #l Phe Lys Phe Ile Asp             1155          - #      1160           - #     1165                      - - Met Pro Thr Glu Gly Lys Pro Thr Lys Ser Th - #r Lys Pro Tyr Lys Asn         1170              - #  1175               - # 1180                          - - Gly Gln Leu Ser Lys Val Met Ile Ile Glu As - #n Ser His Val Lys Lys      1185                1190 - #                1195 - #               1200        - - Asp Asp Ile Trp Pro Ser Gly Gly Gln Met Th - #r Val Lys Asp Leu Thr                     1205  - #              1210   - #             1215              - - Ala Lys Tyr Thr Glu Gly Gly Asn Ala Ile Le - #u Glu Asn Ile Ser Phe                 1220      - #          1225       - #          1230                 - - Ser Ile Ser Pro Gly Gln Arg Val Gly Leu Le - #u Gly Arg Thr Gly Ser             1235          - #      1240           - #     1245                      - - Gly Lys Ser Thr Leu Leu Ser Ala Phe Leu Ar - #g Leu Leu Asn Thr Glu         1250              - #  1255               - # 1260                          - - Gly Glu Ile Gln Ile Asp Gly Val Ser Trp As - #p Ser Ile Thr Leu Gln      1265                1270 - #                1275 - #               1280        - - Gln Trp Arg Lys Ala Phe Gly Val Ile Pro Gl - #n Lys Val Phe Ile Phe                     1285  - #              1290   - #             1295              - - Ser Gly Thr Phe Arg Lys Asn Leu Asp Pro Ty - #r Glu Gln Trp Ser Asp                 1300      - #          1305       - #          1310                 - - Gln Glu Ile Trp Lys Val Ala Asp Glu Val Gl - #y Leu Arg Ser Val Ile             1315          - #      1320           - #     1325                      - - Glu Gln Phe Pro Gly Lys Leu Asp Phe Val Le - #u Val Asp Gly Gly Cys         1330              - #  1335               - # 1340                          - - Val Leu Ser His Gly His Lys Gln Leu Met Cy - #s Leu Ala Arg Ser Val      1345                1350 - #                1355 - #               1360        - - Leu Ser Lys Ala Lys Ile Leu Leu Leu Asp Gl - #u Pro Ser Ala His Leu                     1365  - #              1370   - #             1375              - - Asp Pro Val Thr Tyr Gln Ile Ile Arg Arg Th - #r Leu Lys Gln Ala Phe                 1380      - #          1385       - #          1390                 - - Ala Asp Cys Thr Val Ile Leu Cys Glu His Ar - #g Ile Glu Ala Met Leu             1395          - #      1400           - #     1405                      - - Glu Cys Gln Gln Phe Leu Val Ile Glu Glu As - #n Lys Val Arg Gln Tyr         1410              - #  1415               - # 1420                          - - Asp Ser Ile Gln Lys Leu Leu Asn Glu Arg Se - #r Leu Phe Arg Gln Ala      1425                1430 - #                1435 - #               1440        - - Ile Ser Pro Ser Asp Arg Val Lys Leu Phe Pr - #o His Arg Asn Ser Ser                     1445  - #              1450   - #             1455              - - Lys Cys Lys Ser Lys Pro Gln Ile Ala Ala Le - #u Lys Glu Glu Thr Glu                 1460      - #          1465       - #          1470                 - - Glu Glu Val Gln Asp Thr Arg Leu                                                 1475          - #      1480                                             - -  - - <210> SEQ ID NO 3                                                   <211> LENGTH: 22846                                                           <212> TYPE: DNA                                                               <213> ORGANISM: Homo sapiens                                                   - - <400> SEQUENCE: 3                                                         - - ccacccttgg agttcactca cctaaacctc aaactaataa agcttggttc tt -             #ttctccga     60                                                                 - - cacgcaaagg aagcgctaag gtaaatgcat cagacccaca ctgccgcgga ac -            #ttttcggc    120                                                                 - - tctctaaggc tgtattttga tatacgaaag gcacattttc cttccctttt ca -            #aaatgcac    180                                                                 - - cttgcaaacg taacagggac ccgactagga tcatcgggaa aaggaggagg ag -            #gaggaagg    240                                                                 - - caggctccgg ggaagctggt ggcagcgggt cctgggtctg gcggaccctg ac -            #gcgaagga    300                                                                 - - gggtctagga agctctccgg ggagccgttc tcccgccggt ggcttcttct gt -            #cctccagc    360                                                                 - - gttgccaact ggacctaaag agaggccgcg actgtcgccc acctgcggga tg -            #ggcctggt    420                                                                 - - gctgggcggt aaggacacgg acctggaagg agcgcgcgcg agggagggag gc -            #tgggagtc    480                                                                 - - agaatcggga aagggaggtg cggggcggcg agggagcgaa ggaggagagg ag -            #gaaggagc    540                                                                 - - gggaggggtg ctggcggggg tgcgtagtgg gtggagaaag ccgctagagc aa -            #atttgggg    600                                                                 - - ccggaccagg cagcactcgg cttttaacct gggcagtgaa ggcgggggaa ag -            #agcaaaag    660                                                                 - - gaaggggtgg tgtgcggagt aggggtgggt ggggggaatt ggaagcaaat ga -            #catcacag    720                                                                 - - caggtcagag aaaaagggtt gagcggcagg cacccagagt agtaggtctt tg -            #gcattagg    780                                                                 - - agcttgagcc cagacggccc tagcagggac cccagcgccc agagaccatg ca -            #gaggtcgc    840                                                                 - - ctctggaaaa ggccagcgtt gtctccaaac tttttttcag gtgagaaggt gg -            #ccaaccga    900                                                                 - - gcttcggaaa gacacgtgcc cacgaaagag gagggcgtgt gtatgggttg gg -            #tttggggt    960                                                                 - - aaaggaataa gcagttttta aaaagatgcg ctatcattca ttgttttgaa ag -            #aaaatgtg   1020                                                                 - - ggtattgtag aataaaacag aaagcattaa gaagagatgg aagaatgaac tg -            #aagctgat   1080                                                                 - - tgaatagaga gccacatcta cttgcaactg aaaagttaga atctcaagac tc -            #aagtacgc   1140                                                                 - - tactatgcac ttgttttatt tcatttttct aagaaactaa aaatacttgt ta -            #ataagtac   1200                                                                 - - ctangtatgg tttattggtt ttcccccttc atgccttgga cacttgattg tc -            #ttcttggc   1260                                                                 - - acatacaggt gccatgcctg catatagtaa gtgctcagaa aacatttctt ga -            #ctgaattc   1320                                                                 - - agccaacaaa aattttgggg taggtagaaa atatatgctt aaagtattta tt -            #gttatgag   1380                                                                 - - actggatata tctagtattt gtcacaggta aatgattctt caaaaattga aa -            #gcaaattt   1440                                                                 - - gttgaaatat ttattttgaa aaaagttact tcacaagcta taaattttaa aa -            #gccatagg   1500                                                                 - - aatagatacc gaagttatat ccaactgaca tttaataaat tgtattcata gc -            #ctaatgtg   1560                                                                 - - atgagccaca gaagcttaaa ccatactatt attccctccc aatccctttg ac -            #aaagtgac   1620                                                                 - - agtcacatta gttcagagat attgatgttt tatacaggtg tagcctgtaa ga -            #gatgaagc   1680                                                                 - - ctggtattta tagaaattga cttattttat tctcatattt acatgtgcat aa -            #ttttccat   1740                                                                 - - atgccagaaa agttgaatag tatcagattc caaatctgta tggagaccaa at -            #caagtgaa   1800                                                                 - - tatctgttcc tcctctcttt attttagctg gaccagacca attttgagga aa -            #ggatacag   1860                                                                 - - acagcgcctg gaattgtcag acatatacca aatcccttct gttgattctg ct -            #gacaatct   1920                                                                 - - atctgaaaaa ttggaaaggt atgttcatgt acattgttta gttgaagaga ga -            #aattcata   1980                                                                 - - ttattaatta tttagagaag agaaagcaaa catattataa gtttaattct ta -            #tatttaaa   2040                                                                 - - aataggagcc aagtatggtg gctaatgcct gtaatcccaa ctatttggga gg -            #ccaagatg   2100                                                                 - - agaggattgc ttgagaccag gagtttgata ccagcctggg caacatagca ag -            #atgttatc   2160                                                                 - - tctacacaaa ataaaaagtt agctgggaat ggtagtgcat gcttgtaagg aa -            #tctgccag   2220                                                                 - - atatctggct gagtgtttgg tgttgtatgg tctccatgag attttgtctc ta -            #taatactt   2280                                                                 - - gggttaatct ccttggatat acttgtgtga atcaaactat gttaagggaa at -            #aggacaac   2340                                                                 - - taaaatattt gcacatgcaa cttattggtc ccacttttta ttcttttgca ga -            #gaatggga   2400                                                                 - - tagagagctg gcttcaaaga aaaatcctaa actcattaat gcccttcggc ga -            #tgtttttt   2460                                                                 - - ctggagattt atgttctatg gaatcttttt atatttaggg gtaaggatct ca -            #tttgtaca   2520                                                                 - - ttcattatgt atcacataac tatatgcatt tttgtgatta tgaaaagact ac -            #gaaatctg   2580                                                                 - - gtgaataggt gtaaaaatat aaaggatgaa tccaactcca aacactaaga aa -            #ccacctaa   2640                                                                 - - aactctagta aggataagta accactattc actgtttaac ttaaaatacc tc -            #atatgtaa   2700                                                                 - - acttgtctcc cactgttgct ataacaaatc ccaagtctta tttcaaagta cc -            #aagatatt   2760                                                                 - - gaaaatagtg ctaagagttt cacatatggt atgaccctct atataaactc at -            #tttaagtc   2820                                                                 - - tcctctaaag atgaaaagtc ttgtgttgaa attctcaggg tattttatga ga -            #aataaatg   2880                                                                 - - aaatttaatt tctctgtttt tccccttttg taggaagtca ccaaagcagt ac -            #agcctctc   2940                                                                 - - ttactgggaa gaatcatagc ttcctatgac ccggataaca aggaggaacg ct -            #ctatcgcg   3000                                                                 - - atttatctag gcataggctt atgccttctc tttattgtga ggacactgct cc -            #tacaccca   3060                                                                 - - gccatttttg gccttcatca cattggaatg cagatgagaa tagctatgtt ta -            #gtttgatt   3120                                                                 - - tataagaagg taatacttcc ttgcacaggc cccatggcac atatattctg ta -            #tcgtacat   3180                                                                 - - gttttaatgt cataaattag gtagtgagct ggtacaagta agggataaat gc -            #tgaaatta   3240                                                                 - - atttaatatg cctattaaat aaatggcagg aataattaat gctcttaatt at -            #ccttgata   3300                                                                 - - atttaattga cttaaactga taattattga gtatctaatt atttctgcct ag -            #atgctggg   3360                                                                 - - aaataaaaca actagaagca tgccagtata atattgactg ttgaaagaaa ca -            #tttatgaa   3420                                                                 - - cctgagaaga tagtaagcta gatgaataga atataatttt cattaccttt ac -            #ttaataat   3480                                                                 - - gaatgcataa taactgaatt agtcatatta taattttact tataatatat tt -            #gtattttg   3540                                                                 - - tttgttgaaa ttatctaact ttccattttt cttttagact ttaaagctgt ca -            #agccgtgt   3600                                                                 - - tctagataaa ataagtattg gacaacttgt tagtctcctt tccaacaacc tg -            #aacaaatt   3660                                                                 - - tgatgaagta tgtacctatt gatttaatct tttaggcact attgttataa at -            #tatacaac   3720                                                                 - - tggaaaggcg gagttttcct gggtcagata atagtaatta gtggttaagt ct -            #tgctcagc   3780                                                                 - - tctagcttcc ctattctgga aactaagaaa ggtcaattgt atagcagagc ac -            #cattctgg   3840                                                                 - - ggtctggtag aaccacccaa ctcaaaggca ccttagcctg ttgttaataa ga -            #tttttcaa   3900                                                                 - - aacttaattc ttatcagacc ttgcttcttt taaacgacat gatacttaag at -            #gtccaatc   3960                                                                 - - ttgattccac tgaataaaaa tatgcttaaa aatgcactga cttgaaattt gt -            #tttttggg   4020                                                                 - - aaaaccgatt ctatgtgtag aatgtttaag cacattgcta tgtgctccat gt -            #aatgatta   4080                                                                 - - cctagatttt agtgtgctca gaaccacgaa gtgtttgatc atataagctc ct -            #tttacttg   4140                                                                 - - ctttctttca tatatgattg ttagtttcta ggggtggaag atacaatgac ac -            #ctgttttt   4200                                                                 - - gctgtgcttt tattttccag ggacttgcat tggcacattt cgtgtggatc gc -            #tcctttgc   4260                                                                 - - aagtggcact cctcatgggg ctaatctggg agttgttaca ggcgtctgcc tt -            #ctgtggac   4320                                                                 - - ttggtttcct gatagtcctt gccctttttc aggctgggct agggagaatg at -            #gatgaagt   4380                                                                 - - acaggtagca acctattttc ataacttgaa agttttaaaa attatgtttt ca -            #aaaagccc   4440                                                                 - - actttagtaa aaccaggact gctctatgca tagaacagtg atcttcagtg tc -            #attaaatt   4500                                                                 - - tttttttttt ttttttttga gacagagtct agatctgtca cccaggctgg ag -            #tgcagtgg   4560                                                                 - - cacgatcttg gctcactgca ctgcaacttc tgcctcccag gctcaagcaa tt -            #ctcctgcc   4620                                                                 - - tcagcctccg gagtagctgg gattagaggc gcatgcacca cacccagcta at -            #ttttgtat   4680                                                                 - - tttagtagag acagggtttc accaggttgc ccaggctggt ctcgaatgcc tg -            #acctcagg   4740                                                                 - - tgatccgccc acctcggcct cccaaagtac tgatattaca ggcatgagct ac -            #cgcgcccg   4800                                                                 - - gcctaaaaaa tactttttaa gatggtgtaa atattacttt ctgtatcaat gg -            #tacatttt   4860                                                                 - - ttacttgtca gtctctagaa tttctttata aatatgttga ttcagttcat tt -            #ttgtagat   4920                                                                 - - tataaaacag gtaaaaaagg ataaaacatt tatgtgaatt aaagggaata cc -            #taattttt   4980                                                                 - - gtgtagagtt tattagcttt tactactctg gtttatggat catcacacca ga -            #gccttagt   5040                                                                 - - tactttgtgt tacagaataa ctaatatgag tgaatgaatg acttacacaa gt -            #cactgctt   5100                                                                 - - aggataaagg gcttgagttt gtcagctaga gtatgacaga aagtatctaa gt -            #tttggagt   5160                                                                 - - caaatagcac tttgtttgaa tcccagattg catgcttact agttatgtga cc -            #ttagtcaa   5220                                                                 - - gccacttcac ctcactgagt ctttgctttt ttcatctcta aaatagagat ac -            #ccaccgct   5280                                                                 - - cataggctgt cataaggata gagatagcat atggaatgag tctgtacagc gt -            #ctggcaca   5340                                                                 - - taggaggcat ttaccaaaca gtagttatta tttttgttac catctatttg at -            #aataaaat   5400                                                                 - - aatgcccatc tgttgaataa aagaaatatg acttaaaacc ttgagcagtt ct -            #taatagat   5460                                                                 - - aatttgactt gtttttacta ttagattgat tgattgattg attgattgat tt -            #acagagat   5520                                                                 - - cagagagctg ggaagatcag tgaaagactt gtgattacct cagaaatgat tg -            #aaaatatc   5580                                                                 - - caatctgtta aggcatactg ctgggaagaa gcaatggaaa aaatgattga aa -            #acttaaga   5640                                                                 - - cagtaagttg ttccaataat ttcaatattg ttagtaattc tgtccttaat tt -            #tttaaaaa   5700                                                                 - - tatgtttatc atggtagact tccacctcat atttgatgtt tgtgacaatc aa -            #atgattgc   5760                                                                 - - atttaagttc tgtcaatatt catgcattag ttgcacaaat tcactttcat gg -            #gctgtagt   5820                                                                 - - tttatgtagt tggtccaggg tgttatttta tgctgcaagt atattatact ga -            #tacgttat   5880                                                                 - - taaagaattt cctacatatg ttcactgctg ctcaatacat ttatttcgtt aa -            #aaacaatt   5940                                                                 - - atcaagatac tgaaggctga ttggtaactc acatggaact gggagagtat ac -            #aattctga   6000                                                                 - - accaaataga tgatttacaa gtactacaag caaaacactg gtactttcat tg -            #ttatcttt   6060                                                                 - - tcatataagg taactgaggc ccagagagat taaataacat gcccaaggtc ac -            #acaggtca   6120                                                                 - - tatgatgtgg agccaggtta aaaatatagg cagaaagact ctagagacca tg -            #ctcagatc   6180                                                                 - - ttccattcca agatccctga tatttgaaaa ataaaataac atcctgaatt tt -            #attgttat   6240                                                                 - - tgttttttat agaacagaac tgaaactgac tcggaaggca gcctatgtga ga -            #tacttcaa   6300                                                                 - - tagctcagcc ttcttcttct cagggttctt tgtggtgttt ttatctgtgc tt -            #ccctatgc   6360                                                                 - - actaatcaaa ggaatcatcc tccggaaaat attcaccacc atctcattct gc -            #attgttct   6420                                                                 - - gcgcatggcg gtcactcggc aatttccctg ggctgtacaa acatggtatg ac -            #tctcttgg   6480                                                                 - - agcaataaac aaaatacagg taatgtacca taatgctgca ttatatacta tg -            #atttaaat   6540                                                                 - - aatcagtcaa tagatcagtt ctaatgaact ttgcaaaaat gtgcgaaaag at -            #agaaaaag   6600                                                                 - - aaatttcctt cactaggaag ttataaaagt tgccagctaa tactaggaat gt -            #tcacctta   6660                                                                 - - aacttttcct agcatttctc tggacagtat gatggatgag agtggcattt at -            #gcaaatta   6720                                                                 - - ccttaaaatc ccaataatac tgatgtagct agcagctttg agaaagcaca tt -            #agtgggta   6780                                                                 - - attcagggtt gctttgtaaa ttcatcacta aggttagcat gtaatagtac aa -            #ggaagaat   6840                                                                 - - cagttgtatg ttaaatctaa tgtataaaaa gttttataaa atatcatatg tt -            #tagagagt   6900                                                                 - - atatttcaaa tatgatgaat cctagtgctt ggcaaattaa ctttagaaca ct -            #aataaaat   6960                                                                 - - tattttatta agaaataatt actatttcat tattaaaatt catatataag at -            #gtagcaca   7020                                                                 - - atgagagtat aaagtagatg taataatgca ttaatgctat tctgattcta ta -            #atatgttt   7080                                                                 - - ttgctctctt ttataaatag gatttcttac aaaagcaaga atataagaca tt -            #ggaatata   7140                                                                 - - acttaacgac tacagaagta gtgatggaga atgtaacagc cttctgggag ga -            #ggtcagaa   7200                                                                 - - tttttaaaaa attgtttgct ctaaacacct aactgttttc ttctttgtga at -            #atggattt   7260                                                                 - - catcctaatg gcgaataaaa ttagaatgat gatataactg gtagaactgg aa -            #ggaggatc   7320                                                                 - - actcacttat tttctagatt aagaagtaga ggaatggcca ggtgctcatg gt -            #tgtaatcc   7380                                                                 - - cagcactttc gggagaccaa ggcgggtgga tcacctgagg tcaggagttc aa -            #gaccagcc   7440                                                                 - - tgccaacatg gtaaaacccg gtctctacta aaaatacaaa aaattaactg gg -            #tagtgact   7500                                                                 - - ttaaagctgt gtgactttag tcatttaact gctgagtcac agtctacagc tt -            #tgaaagag   7560                                                                 - - gaggattata aaatctatct catgttaatg ctgaagatta aataatagtg tt -            #tatgtacc   7620                                                                 - - ccgcttatag gagaagaggg tgtgtgtgtg tgtgtgtgtg tgtgtgtgtg tg -            #tatgtgta   7680                                                                 - - tgtatacatg tatgtattca gtctttactg aaattaaaaa atctttaact tg -            #ataatggg   7740                                                                 - - caaatatctt agttttagat catgtcctct agaaaccgta tgctatataa tt -            #atgtacta   7800                                                                 - - taaagtaata atgtatacag tgtaatggat catgggccat gtgcttttca aa -            #ctaattgt   7860                                                                 - - acataaaaca agcatctatt gaaaatatct gacaaactca tcttttattt tt -            #gatgtgtg   7920                                                                 - - tgtgtgtgtg tgtgtgtgtt tttttaacag ggatttgggg aattatttga ga -            #aagcaaaa   7980                                                                 - - caaaacaata acaatagaaa aacttctaat ggtgatgaca gcctcttctt ca -            #gtaatttc   8040                                                                 - - tcacttcttg gtactcctgt cctgaaagat attaatttca agatagaaag ag -            #gacagttg   8100                                                                 - - ttggcggttg ctggatccac tggagcaggc aaggtagttc ttttgttctt ca -            #ctattaag   8160                                                                 - - aacttaattt ggtgtccatg tctctttttt tttctagttt gtagtgctgg aa -            #ggtatttt   8220                                                                 - - tggagaaatt cttacatgag cattaggaga atgtatgggt gtagtgtctt gt -            #ataataga   8280                                                                 - - aattgttcca ctgataattt actctagttt tttatttcct catattattt tc -            #agtggctt   8340                                                                 - - tttcttccac atctttatat tttgcaccac attcaacact gtatcttgca ca -            #tggcgagc   8400                                                                 - - attcaataac tttattgaat aaacaaatca tccattttat ccattcttaa cc -            #agaacaga   8460                                                                 - - cattttttca gagctggtcc aggaaaatca tgacttacat tttgccttag ta -            #accacata   8520                                                                 - - aacaaaaagt ctccattttt gttgaccact gtagctgtac taccttccat ct -            #cctcaacc   8580                                                                 - - tattccaact atctgaatca tgtgcccttc tctgtgaacc tctatcataa ta -            #cttgtcac   8640                                                                 - - actgtattgt aattgtctct tttactttcc cttgtatctt ttgtgcatag ca -            #gagtacct   8700                                                                 - - gaaacaggaa gtattttaaa tattttgaat caaatgagtt aatagaatct tt -            #acaaataa   8760                                                                 - - gaatatacac ttctgcttag gatgataatt ggaggcaagt gaatcctgag cg -            #tgatttga   8820                                                                 - - taatgaccta ataatgatgg gttttatttc cagacttcac ttctaatgat ga -            #ttatggga   8880                                                                 - - gaactggagc cttcagaggg taaaattaag cacagtggaa gaatttcatt ct -            #gttctcag   8940                                                                 - - ttttcctgga ttatgcctgg caccattaaa gaaaatatca tctttggtgt tt -            #cctatgat   9000                                                                 - - gaatatagat acagaagcgt catcaaagca tgccaactag aagaggtaag aa -            #actatgtg   9060                                                                 - - aaaacttttt gattatgcat atgaaccctt cacactaccc aaattatata tt -            #tggctcca   9120                                                                 - - tattcaatcg gttagtctac atatatttat gtttcctcta tgggtaagct ac -            #tgtgaatg   9180                                                                 - - gatcaattaa taaaacacat gacctatgct ttaagaagct tgcaaacaca tg -            #aaataaat   9240                                                                 - - gcaatttatt ttttaaataa tgggttcatt tgatcacaat aaatgcattt ta -            #tgaaatgg   9300                                                                 - - tgagaatttt gttcactcat tagtgagaca aacgtctcaa tggttattta ta -            #tggcatgc   9360                                                                 - - atatagtgat atgtggtata tacccataaa tatacacata ttttaatttt tg -            #gtatttta   9420                                                                 - - taattattat ttaatgatca ttcatgacat tttaaaaatt acaggaaaaa tt -            #tacatcta   9480                                                                 - - aaatttcagc aatgttgttt ttgaccaact aaataaattg catttgaaat aa -            #tggagatg   9540                                                                 - - caatgttcaa aatttcaact gtggttaaag caatagtgtg atatatgatt ac -            #attagaag   9600                                                                 - - gaagatgtgc ctttcaaatt cagattgagc atactaaaag tgactctcta at -            #tttctatt   9660                                                                 - - tttggtaata ggacatctcc aagtttgcag agaaagacaa tatagttctt gg -            #agaaggtg   9720                                                                 - - gaatcacact gagtggaggt caacgagcaa gaatttcttt agcaaggtga at -            #aactaatt   9780                                                                 - - attggtctag caagcatttg ctgtaaatgt cattcatgta aaaaaattac ag -            #acatttct   9840                                                                 - - ctattgcttt atattctgtt tctggaattg aaaaaatcct ggggttttat gg -            #ctagtggg   9900                                                                 - - ttaagaatca catttaagaa ctataaataa tggtatagta tccagatttg gt -            #agagatta   9960                                                                 - - tggttactca gaatctgtgc ccgtatcttg gcttacagtt agcaaaatca ct -            #tcagcagt  10020                                                                 - - tcttggaatg ttgtgaaaag tgataaaaat cttctgcaac ttattccttt at -            #tcctcatt  10080                                                                 - - taaaataatc taccatagta aaaacatgta taaaagtgct acttctgcac ca -            #cttttgag  10140                                                                 - - aatagtgtta tttcagtgaa tcgatgtggt gaccatattg taatgcatgt ag -            #tgaactgt  10200                                                                 - - ttaaggcaaa tcatctacac tagatgacca ggaaatagag aggaaatgta at -            #ttaatttc  10260                                                                 - - cattttcttt ttagagcagt atacaaagat gctgatttgt atttattaga ct -            #ctcctttt  10320                                                                 - - ggatacctag atgttttaac agaaaaagaa atatttgaaa ggtatgttct tt -            #gaatacct  10380                                                                 - - tacttataat gctcatgcta aaataaaaga aagacagact gtcccatcat ag -            #attgcatt  10440                                                                 - - ttacctcttg agaaatatgt tcaccattgt tggtatggca gaatgtagca tg -            #gtattaac  10500                                                                 - - tcaaatctga tctgccctac tgggccagga ttcaagatta cttccattaa aa -            #ccttttct  10560                                                                 - - caccgcctca tgctaaacca gtttctctca ttgctatact gttatagcaa tt -            #gctatcta  10620                                                                 - - tgtagttttt gcagtatcat tgccttgtga tatatattac tttaattgaa tt -            #cacaaggt  10680                                                                 - - accaatttaa ttactacaga gtacttatag aatcatttaa aatataataa aa -            #ttgtatga  10740                                                                 - - tagagattat atgcaataaa acattaacaa aatgctaaaa tacgagacat at -            #tgcaataa  10800                                                                 - - agtatttata aaattgatat ttatatgttt ttatatctta aagctgtgtc tg -            #taaactga  10860                                                                 - - tggctaacaa aactaggatt ttggtcactt ctaaaatgga acatttaaag aa -            #agctgaca  10920                                                                 - - aaatattaat tttgcatgaa ggtagcagct atttttatgg gacattttca ga -            #actccaaa  10980                                                                 - - atctacagcc agactttagc tcaaaactca tgggatgtga ttctttcgac ca -            #atttagtg  11040                                                                 - - cagaaagaag aaattcaatc ctaactgaga ccttacaccg tttctcatta ga -            #aggagatg  11100                                                                 - - ctcctgtctc ctggacagaa acaaaaaaac aatcttttaa acagactgga ga -            #gtttgggg  11160                                                                 - - aaaaaaggaa gaattctatt ctcaatccaa tcaactctat acgaaaattt tc -            #cattgtgc  11220                                                                 - - aaaagactcc cttacaaatg aatggcatcg aagaggattc tgatgagcct tt -            #agagagaa  11280                                                                 - - ggctgtcctt agtaccagat tctgagcagg gagaggcgat actgcctcgc at -            #cagcgtga  11340                                                                 - - tcagcactgg ccccacgctt caggcacgaa ggaggcagtc tgtcctgaac ct -            #gatgacac  11400                                                                 - - actcagttaa ccaaggtcag aacattcacc gaaagacaac agcatccaca cg -            #aaaagtgt  11460                                                                 - - cactggcccc tcaggcaaac ttgactgaac tggatatata ttcaagaagg tt -            #atctcaag  11520                                                                 - - aaactggctt ggaaataagt gaagaaatta acgaagaaga cttaaaggta gg -            #tatacatc  11580                                                                 - - gcttgggggt atttcacccc acagaatgca attgagtaga atgcaatatg ta -            #gcatgtaa  11640                                                                 - - caaaatttac taaaatcata ggattaggat aaggtgtatc ttaaaactca ga -            #aagtatga  11700                                                                 - - agttcattaa ttatacaagc aacgttaaaa tgtaaaataa caaatgattt ct -            #ttttgcaa  11760                                                                 - - tggacatatc tcttcccata aaatgggaaa ggatttagtt tttggtcctc ta -            #ctaagcca  11820                                                                 - - gtgataactg tgactatagt tagaaagcat ttgctttatt accatcttga ac -            #cctctgtg  11880                                                                 - - ggaaacttca tttagatggt atcattcatt tgataaaagg tatgccactg tt -            #aagccttt  11940                                                                 - - aatggtaaaa ttgtccaata ataatacagt tatataatca gtgatacatt tt -            #tagaattt  12000                                                                 - - tgaaaaatta cgatgtttct catttttaat aaagctgtgt tgctccagta ga -            #cattattc  12060                                                                 - - tggctataga atgacatcat acatggcatt tataatgatt tatatttgtt aa -            #aatacact  12120                                                                 - - tagattcaag taatactatt cttttatttt catatattaa aaataaaacc ac -            #aatggtgg  12180                                                                 - - catgaaactg tactgtctta ttgtaatagc cataattctt ttattcagga gt -            #gctttttt  12240                                                                 - - gatgatatgg agagcatacc agcagtgact acatggaaca cataccttcg at -            #atattact  12300                                                                 - - gtccacaaga gcttaatttt tgtgctaatt tggtgcttag taatttttct gg -            #cagaggta  12360                                                                 - - agaatgttct attgtaaagt attactggat ttaaagttaa attaagatag tt -            #tggggatg  12420                                                                 - - tatacatata tatgcacaca cataaatatg tatatataca catgtataca tg -            #tataagta  12480                                                                 - - tgcatatata cacacatata tcactatatg tatatatgta tatattacat at -            #atttgtga  12540                                                                 - - ttttacagta tataatggta tagattcata tagttcttag cttctgaaaa at -            #caacaagt  12600                                                                 - - agaaccacta ctgagaattc cattaactta atgtggtctc atcacaaata at -            #agtactta  12660                                                                 - - gaacacctag tacagctgct ggacccagga acacaaagca aaggaagatg aa -            #attgtgtg  12720                                                                 - - taccttgata ttggtacaca catcaaatgg tgtgatgtga atttagatgt gg -            #gcatggga  12780                                                                 - - ggaataggtg aagatgttag aaaaaaaatc aactgtgtct tgttccattc ca -            #ggtggctg  12840                                                                 - - cttctttggt tgtgctgtgg ctccttggaa agtgagtatt ccatgtccta tt -            #gtgtagat  12900                                                                 - - tgtgttttat ttctgttgat taaatattgt aatccactat gtttgtatgt at -            #tgtaatcc  12960                                                                 - - actttgtttc atttctccca agcattatgg tagtggaaag ataaggtttt tt -            #gtttaaat  13020                                                                 - - gatgaccatt agttgggtga ggtgacacat tcctgtagtc ctagctcctc ca -            #caggctga  13080                                                                 - - cgcaggagga tcacttgagc ccaggagttc agggctgtag tgttgtatca tt -            #gtgagtag  13140                                                                 - - ccaccaccgc actccagcct ggacaatata gtgagatcct atatctaaaa ta -            #aaataaaa  13200                                                                 - - taaaatgaat aaattgtgag catgtgcagc tcctgtccta tatctaaata aa -            #taaataaa  13260                                                                 - - tgaataaatt gtgagcatgt gcagctcctg cagtttctaa agaatatagt tc -            #tgttcagt  13320                                                                 - - ttctgtgaaa cacaataaaa atatttgaaa taacattaca tatttagggt tt -            #tcttcaaa  13380                                                                 - - ttttttaatt taataaagaa caactcaatc tctatcaata gtgagaaaac at -            #atctattt  13440                                                                 - - tcttgcaata atagtatgat tttgaggtta agggtgcatg ctcttctaat gc -            #aaaatatt  13500                                                                 - - gtatttattt agactcaagt ttagttccat ttacatgtat tggaaattca gt -            #aagtaact  13560                                                                 - - ttggctgcca aataacgatt tcctatttgc tttacagcac tcctcttcaa ga -            #caaaggga  13620                                                                 - - atagtactca tagtagaaat aacagctatg cagtgattat caccagcacc ag -            #ttcgtatt  13680                                                                 - - atgtgtttta catttacgtg ggagtagccg acactttgct tgctatggga tt -            #cttcagag  13740                                                                 - - gtctaccact ggtgcatact ctaatcacag tgtcgaaaat tttacaccac aa -            #aatgttac  13800                                                                 - - attctgttct tcaagcacct atgtcaaccc tcaacacgtt gaaagcaggt ac -            #tttactag  13860                                                                 - - gtctaagaaa tgaaactgct gatccaccat caatagggcc tgtggttttg tt -            #ggttttct  13920                                                                 - - aatggcagtg ctggcttttg cacagaggca tgtgcctttg ttgtaagatt gt -            #aagcagga  13980                                                                 - - tgagtaccca cctattcctg acataattta tagtaaaagc tatttcagag aa -            #attggtcg  14040                                                                 - - ttacttgaat cttacaagaa tctgaaactt ttaaaaaggt ttaaaagtaa aa -            #gacaataa  14100                                                                 - - cttgaacaca taattattta gaatgtttgg aaagaaacaa aaatttctaa gt -            #ctatctga  14160                                                                 - - ttctatttgc taattcttat ttgggttctg aatgcgtcta ctgtgatcca aa -            #cttagtat  14220                                                                 - - tgaatatatt gatatatctt taaaaaatta gtgttttttg aggaatttgt ca -            #tcttgtat  14280                                                                 - - attataggtg ggattcttaa tagattctcc aaagatatag caattttgga tg -            #accttctg  14340                                                                 - - cctcttacca tatttgactt catccaggta tgtaaaaata agtaccgtta ag -            #tatgtctg  14400                                                                 - - tattattaaa aaaacaataa caaaagcaaa tgtgattttg ttttcatttt tt -            #atttgatt  14460                                                                 - - gagggttgaa gtcctgtcta ttgcattaat tttgtaatta tccaaagcct tc -            #aaaataga  14520                                                                 - - cataagttta gtaaattcaa taataagtca gaactgctta cctggcccaa ac -            #ctgaggca  14580                                                                 - - atcccacatt tagatgtaat agctgtctac ttgggagtga tttgagaggc ac -            #aaaggacc  14640                                                                 - - atctttccca aaatcactgg cacagtgcac cagcatggca catgtataca ta -            #tgtaacta  14700                                                                 - - acctcgacaa tgtgcacatg taccctaaaa cttaaagtat aataaaaaaa at -            #aaaaaaaa  14760                                                                 - - gtttgaggtg tttaaagtat gcaaaaaaaa aaaaagaaat aaatcactga ca -            #cactttgt  14820                                                                 - - ccactttgca atgtgaaaat gtttactcac caacatgttt tctttgatct ta -            #cagttgtt  14880                                                                 - - attaattgtg attggagcta tagcagttgt cgcagtttta caaccctaca tc -            #tttgttgc  14940                                                                 - - aacagtgcca gtgatagtgg cttttattat gttgagagca tatttcctcc aa -            #acctcaca  15000                                                                 - - gcaactcaaa caactggaat ctgaaggtat gacagtgaat gtgcgatact ca -            #tcttgtaa  15060                                                                 - - aaaagctata agagctattt gagattcttt attgttaatc tacttaaaaa aa -            #attctgct  15120                                                                 - - tttaaacttt tacatcatat aacaataatt tttttctaca tgcatgtgta ta -            #taaaagga  15180                                                                 - - aactatatta caaagtacac atggattttt tttcttaatt aatgaccatg tg -            #acttcatt  15240                                                                 - - ttggttttaa aataggtata tagaatctta ccacagttgg tgtacaggac at -            #tcatttat  15300                                                                 - - ttcaaagaat ggcaccagtg tgaaaaaaag ctttttaacc aatgacattt gt -            #gatatgat  15360                                                                 - - tattctaatt tagtcttttt caggtacaag atattatgaa aattacattt tg -            #tgtttatg  15420                                                                 - - ttatttgcaa tgttttctat ggaaatattt cacaggcagg agtccaattt tc -            #actcatct  15480                                                                 - - tgttacaagc ttaaaaggac tatggacact tcgtgccttc ggacggcagc ct -            #tactttga  15540                                                                 - - aactctgttc cacaaagctc tgaatttaca tactgccaac tggttcttgt ac -            #ctgtcaac  15600                                                                 - - actgcgctgg ttccaaatga gaatagaaat gatttttgtc atcttcttca tt -            #gctgttac  15660                                                                 - - cttcatttcc attttaacaa caggtactat gaactcatta actttagcta ag -            #catttaag  15720                                                                 - - taaaaaattt tcaatgaata aaatgctgca ttctataggt tatcaatttt tg -            #atatcttt  15780                                                                 - - agagtttagt aattaacaaa tttgttggtt tattattgaa caagtgattt ct -            #ttgaaatt  15840                                                                 - - tccattgttt tattgttaaa caaataattt ccttgaaatc ggtatatata ta -            #tatatagt  15900                                                                 - - atatatatat atatatatat atatatacat atatatatat agtattatcc ct -            #gttttcac  15960                                                                 - - agttttaaaa accgatgcac acagattgtc gagtagcaat tctgtgattg aa -            #ggggaaat  16020                                                                 - - atgtcacctc ttcatactca tattggtgaa gggtcctagc ttcaaaatta at -            #agattcct  16080                                                                 - - aaagagggga aatgaaacac cgcatttaca cacacacaca cacacacaca ca -            #cagagttc  16140                                                                 - - ctcttgtcgg taagtttgtt attacttata gaataatagt agaagagaca aa -            #tatggtac  16200                                                                 - - ctacccatta ccaacaacac ctccaatacc agtaacattt tttaaaaagg gc -            #aacacttt  16260                                                                 - - cctaatattc aatcgctctt tgatttaaaa tcctggttga atacttacta ta -            #tgcagagc  16320                                                                 - - attattctat tagtagatgc tgtgatgaac tgagatttaa aaattgttaa aa -            #ttagcata  16380                                                                 - - aaattgaaat gtaaatttaa tgtgatatgt gccctaggag aagtgtgaat aa -            #agtcgttc  16440                                                                 - - acagaagaga gaaataacat gaggttcatt tacgtctttt gtgcatctat ag -            #gagaagga  16500                                                                 - - gaaggaagag ttggtattat cctgacttta gccatgaata tcatgagtac at -            #tgcagtgg  16560                                                                 - - gctgtaaact ccagcataga tgtggatagc ttggtaagtc ttatcatctt tt -            #taactttt  16620                                                                 - - atgaaaaaaa ttcagacaag taacaaagta tgagtaatag catgaggaag aa -            #ctatatac  16680                                                                 - - cgtatattga gcttaagaaa taaaacatta cagataaatt gagggtcact gt -            #gtatctgt  16740                                                                 - - cattaaatcc ttatctcttc tttccttctc atagatagcc actatgaaga tc -            #taatactg  16800                                                                 - - cagtgagcat tctttcacct gtttccttat tcaggatttt ctaggagaaa ta -            #cctagggg  16860                                                                 - - ttgtattgct gggtcatagg attcacccat gcttaacttc tcttcagtta aa -            #cttttaat  16920                                                                 - - tatatccaat tatttcctgt tagttcattg aaaagcccga caaataacca ag -            #tgacaaat  16980                                                                 - - agcaagtgtt gcattttaca agttattttt taggaagcat caaactaatt gt -            #gaaattgt  17040                                                                 - - ctgccattct taaaaacaaa aatgttgtta tttttatttc agatgcgatc tg -            #tgagccga  17100                                                                 - - gtctttaagt tcattgacat gccaacagaa ggtaaaccta ccaagtcaac ca -            #aaccatac  17160                                                                 - - aagaatggcc aactctcgaa agttatgatt attgagaatt cacacgtgaa ga -            #aagatgac  17220                                                                 - - atctggccct cagggggcca aatgactgtc aaagatctca cagcaaaata ca -            #cagaaggt  17280                                                                 - - ggaaatgcca tattagagaa catttccttc tcaataagtc ctggccagag gg -            #tgagattt  17340                                                                 - - gaacactgct tgctttgtta gactgtgttc agtaagtgaa tcccagtagc ct -            #gaagcaat  17400                                                                 - - gtgttagcag aatctatttg taacattatt attgtacagt agaatcaata tt -            #aaacacac  17460                                                                 - - atgttttatt atatggagtc attattttta atatgaaatt taatttgcag ag -            #tctgaact  17520                                                                 - - atatataaag gtcagtgata aaggaagtct gcatcagggg tccaattcct ta -            #tggccagt  17580                                                                 - - ttctctattc tgttccaagg ttgtttgtct ccatatatca acattggtca gg -            #attgaaag  17640                                                                 - - tgtgcaacaa ggtttgaatg aataagtgaa aatcttccac tggtgacagg at -            #aaaatatt  17700                                                                 - - ccaatggttt ttattgaagt acaatactga attatgttta tggcatggta cc -            #tatatgtc  17760                                                                 - - acagaagtga tcccatcact tttaccttat aggtgggcct cttgggaaga ac -            #tggatcag  17820                                                                 - - ggaagagtac tttgttatca gcttttttga gactactgaa cactgaagga ga -            #aatccaga  17880                                                                 - - tcgatggtgt gtcttgggat tcaataactt tgcaacagtg gaggaaagcc tt -            #tggagtga  17940                                                                 - - taccacaggt gagcaaaagg acttagccag aaaaaaggca actaaattat at -            #tttttact  18000                                                                 - - gctatttgat acttgtactc aagaaattca tattactctg caaaatatat tt -            #gttatgca  18060                                                                 - - ttgctgtctt ttttttctcc agtgcagttt tctcataggc agaaaagatg tc -            #tctaaaag  18120                                                                 - - tttgggaatt ctttttaata ttctacaatt aacaattatc tcaatttctt ta -            #ttctaaag  18180                                                                 - - acattggatt agaaaaatgt tcacaaggga ctccaaatat tgctgtagta tt -            #tgtttctt  18240                                                                 - - aaaagaatga tacaaagcag acatgataaa atattaaaat ttgagagaac tt -            #gatggtaa  18300                                                                 - - gtacatgggt gtttcttatt ttaaaataat ttttctactt gaaatatttt ac -            #aatacaat  18360                                                                 - - aagggaaaaa taaaaagtta tttaagttat tcatactttc ttcttctttt ct -            #tttttgct  18420                                                                 - - atagaaagta tttatttttt ctggaacatt tagaaaaaac ttggatccct at -            #gaacagtg  18480                                                                 - - gagtgatcaa gaaatatgga aagttgcaga tgaggtaagg ctgctaactg aa -            #atgatttt  18540                                                                 - - gaaaggggta actcatacca acacaaatgg ctgatatagc tgacatcatt ct -            #acacactt  18600                                                                 - - tgtgtgcatg tatgtgtgtg cacaacttta aaatggagta ccctaacata cc -            #tggagcaa  18660                                                                 - - caggtacttt tgactggacc tacccctaac tgaaatgatt ttgaaagagg ta -            #actcatac  18720                                                                 - - caacacaaat ggttgatatg gctaagatca ttctacacac tttgtgtgca tg -            #tatttctg  18780                                                                 - - tgcacaactt caaaatggag taccctaaaa tacctggcgc gacaagtact tt -            #tgactgag  18840                                                                 - - cctacttcac agttgactat tttatgctat cttttgtcct cagtcatgac ag -            #agtagaag  18900                                                                 - - atgggaggta gcaccaagga tgatgtcata cctccatcct ttatgctaca tt -            #ctatcttc  18960                                                                 - - tgtctacata agatgtcata ctagagggca tatctgcaat gtatacatat ta -            #tcttttcc  19020                                                                 - - agcatgcatt cagttgtgtt ggaataattt atgtacacct ttataaacgc tg -            #agcctcac  19080                                                                 - - aagagccatg tgccacgtat tgtttcttac tacttttgga tacctggcac gt -            #aatagaca  19140                                                                 - - ctcattgaaa gtttcctaat gaatgaagta caaagataaa acaagttata ga -            #ctgattct  19200                                                                 - - tttgagctgt caaggttgta aatagacttt tgctcaatca attcaaatgg tg -            #gcaggtag  19260                                                                 - - tgggggtaga gggattggta tgaaaaacat aagctttcag aactcctgtg tt -            #tattttta  19320                                                                 - - gaatgtcaac tgcttgagtg tttttaactc tgtggtatct gaactatctt ct -            #ctaactgc  19380                                                                 - - aggttgggct cagatctgtg atagaacagt ttcctgggaa gcttgacttt gt -            #ccttgtgg  19440                                                                 - - atgggggctg tgtcctaagc catggccaca agcagttgat gtgcttggct ag -            #atctgttc  19500                                                                 - - tcagtaaggc gaagatcttg ctgcttgatg aacccagtgc tcatttggat cc -            #agtgtgag  19560                                                                 - - tttcagatgt tctgttactt aatagcacag tgggaacaga atcattatgc ct -            #gcttcatg  19620                                                                 - - gtgacacata tttctattag gctgtcatgt ctgcgtgtgg gggtctccca ag -            #atatgaaa  19680                                                                 - - taattgccca gtggaaatga gcataaatgc atatttcctt gctaagagtt ct -            #tgtgtttt  19740                                                                 - - cttccgaaga tagttttgca tgtttatagc cccaaataaa agaagtactg gt -            #gattctac  19800                                                                 - - ataatgaaaa tgtactcatt tattaaagtt tctttgaaat atttgtcctg tt -            #tatttatg  19860                                                                 - - gatacttaga gtctacccca tggttgaaaa gctgattgtg cgtaacgcta ta -            #tcaacatt  19920                                                                 - - atgtgaaaag aacttaaaga aataagtaat ttaaagagat aatagaacaa ta -            #gacatatt  19980                                                                 - - atcaaggtaa atacagatca ttactgttct gtgatattat gtgtggtatt tt -            #ctttcttt  20040                                                                 - - tctagaacat accaaataat tagaagaact ctaaaacaag catttgctga tt -            #gcacagta  20100                                                                 - - attctctgtg aacacaggat agaagcaatg ctggaatgcc aacaattttt gg -            #tgagtctt  20160                                                                 - - tataacttta cttaagatct cattgccctt gtaattcttg ataacaatct ca -            #catgtgat  20220                                                                 - - agttcctgca aattgcaaca atgtacaagt tcttttcaaa aatatgtatc at -            #acagccat  20280                                                                 - - ccagctttac tcaaaatagc tgcacaagtt tttcactttg atctgagcca tg -            #tggtgagg  20340                                                                 - - ttgaaatata gtaaatctaa aatggcagca tattactaag ttatgtttat aa -            #ataggata  20400                                                                 - - tatatacttt tgagcccttt atttgggacc aagtcataca aaatactcta ct -            #gtttaaga  20460                                                                 - - ttttaaaaaa ggtccctgtg attctttcaa taactaaatg tcccatggat gt -            #ggtctgga  20520                                                                 - - caggcctagt tgtcttacag tctgatttat ggtattaatg acaaagttga ga -            #ggcacatt  20580                                                                 - - tcatttttct agccatgatt tgggttcagg tagtaccttt ctcaaccacc tt -            #ctcactgt  20640                                                                 - - tcttaaaaaa actgtcacat ggccaggcac agtggcttac atctgtaatc cc -            #aatacttt  20700                                                                 - - gggaggctga ggtgggggga ttacttgagg ccaggaattc agatggtaga ac -            #ctccttag  20760                                                                 - - agcaaaagga cacagcagtt aaatgtgaca tacctgattg ttcaaaatgc aa -            #ggctctgg  20820                                                                 - - acattgcatt ctttgacttt tattttcctt tgagcctgtg ccagtttctg tc -            #cctgctct  20880                                                                 - - ggtctgacct gccttctgtc ccagatctca ctaacagcca tttccctagg tc -            #atagaaga  20940                                                                 - - gaacaaagtg cggcagtacg attccatcca gaaactgctg aacgagagga gc -            #ctcttccg  21000                                                                 - - gcaagccatc agcccctccg acagggtgaa gctctttccc caccggaact ca -            #agcaagtg  21060                                                                 - - caagtctaag ccccagattg ctgctctgaa agaggagaca gaagaagagg tg -            #caagatac  21120                                                                 - - aaggctttag agagcagcat aaatgttgac atgggacatt tgctcatgga at -            #tggagctc  21180                                                                 - - gtgggacagt cacctcatgg aattggagct cgtggaacag ttacctctgc ct -            #cagaaaac  21240                                                                 - - aaggatgaat taagtttttt tttaaaaaag aaacatttgg taaggggaat tg -            #aggacact  21300                                                                 - - gatatgggtc ttgataaatg gcttcctggc aatagtcaaa ttgtgtgaaa gg -            #tacttcaa  21360                                                                 - - atccttgaag atttaccact tgtgttttgc aagccagatt ttcctgaaaa cc -            #cttgccat  21420                                                                 - - gtgctagtaa ttggaaaggc agctctaaat gtcaatcagc ctagttgatc ag -            #cttattgt  21480                                                                 - - ctagtgaaac tcgttaattt gtagtgttgg agaagaactg aaatcatact tc -            #ttagggtt  21540                                                                 - - atgattaagt aatgataact ggaactcagc ggtttatata agcttgtatt cc -            #tttttctc  21600                                                                 - - tcctctcccc atgatgttta gaaacacaac tatattgttt gctaagcatt cc -            #aactatct  21660                                                                 - - catttccaag caagtattag aataccacag gaaccacaag actgcacatc aa -            #aatatgcc  21720                                                                 - - ccattcaaca tctagtgagc agtcaggaaa gagaacttcc agatcctgga aa -            #tcagggtt  21780                                                                 - - agtattgtcc aggtctacca aaaatctcaa tatttcagat aatcacaata ca -            #tcccttac  21840                                                                 - - ctgggaaagg gctgttataa tctttcacag gggacaggat ggttccctta cc -            #tgggaaag  21900                                                                 - - ggctgttata atctttcaca ggggacagga tggttccctt gatgaagaag tt -            #gatatgcc  21960                                                                 - - ttttcccaac tccagaaagt gacaagctca cagacctttg aactagagtt ta -            #gctggaaa  22020                                                                 - - agtatgttag tgcaaattgt cacaggacag cccttctttc cacagaagct cc -            #aggtagag  22080                                                                 - - ggtgtgtaag tagataggcc atgggcactg tgggtagaca cacatgaagt cc -            #aagcattt  22140                                                                 - - agatgtatag gttgatggtg gtatgttttc aggctagatg tatgtacttc at -            #gctgtcta  22200                                                                 - - cactaagaga gaatgagaga cacactgaag aagcaccaat catgaattag tt -            #ttatatgc  22260                                                                 - - ttctgtttta taattttgtg aagcaaaatt ttttctctag gaaatattta tt -            #ttaataat  22320                                                                 - - gtttcaaaca tatattacaa tgctgtattt taaaagaatg attatgaatt ac -            #atttgtat  22380                                                                 - - aaaataattt ttatatttga aatattgact ttttatggca ctagtatttt ta -            #tgaaatat  22440                                                                 - - tatgttaaaa ctgggacagg ggagaaccta gggtgatatt aaccaggggc ca -            #tgaatcac  22500                                                                 - - cttttggtct ggagggaagc cttggggctg atcgaggttg ttgcccacag ct -            #gtatgatt  22560                                                                 - - cccagccaga cacagcctct tagatgcagt tctgaagaag atggtaccac ca -            #gtctgact  22620                                                                 - - gtttccatca agggtacact gccttctcaa ctccaaactg actcttaaga ag -            #actgcatt  22680                                                                 - - atatttatta ctgtaagaaa atatcacttg tcaataaaat ccatacattt gt -            #gtgaaact  22740                                                                 - - ttgttgtttt cagatgcgtt cacttgtcat gtttcatcag tctctcactc ca -            #atttctaa  22800                                                                 - - gcttcatgga acatgaaaca cgaatctgtc ttttagatat agcctc   - #                  22846                                                                        - -  - - <210> SEQ ID NO 4                                                   <211> LENGTH: 1480                                                            <212> TYPE: PRT                                                               <213> ORGANISM: Homo sapiens                                                   - - <400> SEQUENCE: 4                                                         - - Met Gln Arg Ser Pro Leu Glu Lys Ala Ser Va - #l Val Ser Lys Leu Phe        1               5 - #                 10 - #                 15              - - Phe Ser Trp Thr Arg Pro Ile Leu Arg Lys Gl - #y Tyr Arg Gln Arg Leu                   20     - #             25     - #             30                  - - Glu Leu Ser Asp Ile Tyr Gln Ile Pro Ser Va - #l Asp Ser Ala Asp Asn               35         - #         40         - #         45                      - - Leu Ser Glu Lys Leu Glu Arg Glu Trp Asp Ar - #g Glu Leu Ala Ser Lys           50             - #     55             - #     60                          - - Lys Asn Pro Lys Leu Ile Asn Ala Leu Arg Ar - #g Cys Phe Phe Trp Arg       65                 - # 70                 - # 75                 - # 80       - - Phe Met Phe Tyr Gly Ile Phe Leu Tyr Leu Gl - #y Glu Val Thr Lys Ala                       85 - #                 90 - #                 95              - - Val Gln Pro Leu Leu Leu Gly Arg Ile Ile Al - #a Ser Tyr Asp Pro Asp                  100      - #           105      - #           110                  - - Asn Lys Glu Glu Arg Ser Ile Ala Ile Tyr Le - #u Gly Ile Gly Leu Cys              115          - #       120          - #       125                      - - Leu Leu Phe Ile Val Arg Thr Leu Leu Leu Hi - #s Pro Ala Ile Phe Gly          130              - #   135              - #   140                          - - Leu His His Ile Gly Met Gln Met Arg Ile Al - #a Met Phe Ser Leu Ile      145                 1 - #50                 1 - #55                 1 -      #60                                                                              - - Tyr Lys Lys Thr Leu Lys Leu Ser Ser Arg Va - #l Leu Asp Lys Ile        Ser                                                                                             165  - #               170  - #               175             - - Ile Gly Gln Leu Val Ser Leu Leu Ser Asn As - #n Leu Asn Lys Phe Asp                  180      - #           185      - #           190                  - - Glu Gly Leu Ala Leu Ala His Phe Val Trp Il - #e Ala Pro Leu Gln Val              195          - #       200          - #       205                      - - Ala Leu Leu Met Gly Leu Ile Trp Glu Leu Le - #u Gln Ala Ser Ala Phe          210              - #   215              - #   220                          - - Cys Gly Leu Gly Phe Leu Ile Val Leu Ala Le - #u Phe Gln Ala Gly Leu      225                 2 - #30                 2 - #35                 2 -      #40                                                                              - - Gly Arg Met Met Met Lys Tyr Arg Asp Gln Ar - #g Ala Gly Lys Ile        Ser                                                                                             245  - #               250  - #               255             - - Glu Arg Leu Val Ile Thr Ser Glu Met Ile Gl - #u Asn Ile Gln Ser Val                  260      - #           265      - #           270                  - - Lys Ala Tyr Cys Trp Glu Glu Ala Met Glu Ly - #s Met Ile Glu Asn Leu              275          - #       280          - #       285                      - - Arg Gln Thr Glu Leu Lys Leu Thr Arg Lys Al - #a Ala Tyr Val Arg Tyr          290              - #   295              - #   300                          - - Phe Asn Ser Ser Ala Phe Phe Phe Ser Gly Ph - #e Phe Val Val Phe Leu      305                 3 - #10                 3 - #15                 3 -      #20                                                                              - - Ser Val Leu Pro Tyr Ala Leu Ile Lys Gly Il - #e Ile Leu Arg Lys        Ile                                                                                             325  - #               330  - #               335             - - Phe Thr Thr Ile Ser Phe Cys Ile Val Leu Ar - #g Met Ala Val Thr Arg                  340      - #           345      - #           350                  - - Gln Phe Pro Trp Ala Val Gln Thr Trp Tyr As - #p Ser Leu Gly Ala Ile              355          - #       360          - #       365                      - - Asn Lys Ile Gln Asp Phe Leu Gln Lys Gln Gl - #u Tyr Lys Thr Leu Glu          370              - #   375              - #   380                          - - Tyr Asn Leu Thr Thr Thr Glu Val Val Met Gl - #u Asn Val Thr Ala Phe      385                 3 - #90                 3 - #95                 4 -      #00                                                                              - - Trp Glu Glu Gly Phe Gly Glu Leu Phe Glu Ly - #s Ala Lys Gln Asn        Asn                                                                                             405  - #               410  - #               415             - - Asn Asn Arg Lys Thr Ser Asn Gly Asp Asp Se - #r Leu Phe Phe Ser Asn                  420      - #           425      - #           430                  - - Phe Ser Leu Leu Gly Thr Pro Val Leu Lys As - #p Ile Asn Phe Lys Ile              435          - #       440          - #       445                      - - Glu Arg Gly Gln Leu Leu Ala Val Ala Gly Se - #r Thr Gly Ala Gly Lys          450              - #   455              - #   460                          - - Thr Ser Leu Leu Met Met Ile Met Gly Glu Le - #u Glu Pro Ser Glu Gly      465                 4 - #70                 4 - #75                 4 -      #80                                                                              - - Lys Ile Lys His Ser Gly Arg Ile Ser Phe Cy - #s Ser Gln Phe Ser        Trp                                                                                             485  - #               490  - #               495             - - Ile Met Pro Gly Thr Ile Lys Glu Asn Ile Il - #e Phe Gly Val Ser Tyr                  500      - #           505      - #           510                  - - Asp Glu Tyr Arg Tyr Arg Ser Val Ile Lys Al - #a Cys Gln Leu Glu Glu              515          - #       520          - #       525                      - - Asp Ile Ser Lys Phe Ala Glu Lys Asp Asn Il - #e Val Leu Gly Glu Gly          530              - #   535              - #   540                          - - Gly Ile Thr Leu Ser Gly Gly Gln Arg Ala Ar - #g Ile Ser Leu Ala Arg      545                 5 - #50                 5 - #55                 5 -      #60                                                                              - - Ala Val Tyr Lys Asp Ala Asp Leu Tyr Leu Le - #u Asp Ser Pro Phe        Gly                                                                                             565  - #               570  - #               575             - - Tyr Leu Asp Val Leu Thr Glu Lys Glu Ile Ph - #e Glu Ser Cys Val Cys                  580      - #           585      - #           590                  - - Lys Leu Met Ala Asn Lys Thr Arg Ile Leu Va - #l Thr Ser Lys Met Glu              595          - #       600          - #       605                      - - His Leu Lys Lys Ala Asp Lys Ile Leu Ile Le - #u His Glu Gly Ser Ser          610              - #   615              - #   620                          - - Tyr Phe Tyr Gly Thr Phe Ser Glu Leu Gln As - #n Leu Gln Pro Asp Phe      625                 6 - #30                 6 - #35                 6 -      #40                                                                              - - Ser Ser Lys Leu Met Gly Cys Asp Ser Phe As - #p Gln Phe Ser Ala        Glu                                                                                             645  - #               650  - #               655             - - Arg Arg Asn Ser Ile Leu Thr Glu Thr Leu Hi - #s Arg Phe Ser Leu Glu                  660      - #           665      - #           670                  - - Gly Asp Ala Pro Val Ser Trp Thr Glu Thr Ly - #s Lys Gln Ser Phe Lys              675          - #       680          - #       685                      - - Gln Thr Gly Glu Phe Gly Glu Lys Arg Lys As - #n Ser Ile Leu Asn Pro          690              - #   695              - #   700                          - - Ile Asn Ser Ile Arg Lys Phe Ser Ile Val Gl - #n Lys Thr Pro Leu Gln      705                 7 - #10                 7 - #15                 7 -      #20                                                                              - - Met Asn Gly Ile Glu Glu Asp Ser Asp Glu Pr - #o Leu Glu Arg Arg        Leu                                                                                             725  - #               730  - #               735             - - Ser Leu Val Pro Asp Ser Glu Gln Gly Glu Al - #a Ile Leu Pro Arg Ile                  740      - #           745      - #           750                  - - Ser Val Ile Ser Thr Gly Pro Thr Leu Gln Al - #a Arg Arg Arg Gln Ser              755          - #       760          - #       765                      - - Val Leu Asn Leu Met Thr His Ser Val Asn Gl - #n Gly Gln Asn Ile His          770              - #   775              - #   780                          - - Arg Lys Thr Thr Ala Ser Thr Arg Lys Val Se - #r Leu Ala Pro Gln Ala      785                 7 - #90                 7 - #95                 8 -      #00                                                                              - - Asn Leu Thr Glu Leu Asp Ile Tyr Ser Arg Ar - #g Leu Ser Gln Glu        Thr                                                                                             805  - #               810  - #               815             - - Gly Leu Glu Ile Ser Glu Glu Ile Asn Glu Gl - #u Asp Leu Lys Glu Cys                  820      - #           825      - #           830                  - - Phe Phe Asp Asp Met Glu Ser Ile Pro Ala Va - #l Thr Thr Trp Asn Thr              835          - #       840          - #       845                      - - Tyr Leu Arg Tyr Ile Thr Val His Lys Ser Le - #u Ile Phe Val Leu Ile          850              - #   855              - #   860                          - - Trp Cys Leu Val Ile Phe Leu Ala Glu Val Al - #a Ala Ser Leu Val Val      865                 8 - #70                 8 - #75                 8 -      #80                                                                              - - Leu Trp Leu Leu Gly Asn Thr Pro Leu Gln As - #p Lys Gly Asn Ser        Thr                                                                                             885  - #               890  - #               895             - - His Ser Arg Asn Asn Ser Tyr Ala Val Ile Il - #e Thr Ser Thr Ser Ser                  900      - #           905      - #           910                  - - Tyr Tyr Val Phe Tyr Ile Tyr Val Gly Val Al - #a Asp Thr Leu Leu Ala              915          - #       920          - #       925                      - - Met Gly Phe Phe Arg Gly Leu Pro Leu Val Hi - #s Thr Leu Ile Thr Val          930              - #   935              - #   940                          - - Ser Lys Ile Leu His His Lys Met Leu His Se - #r Val Leu Gln Ala Pro      945                 9 - #50                 9 - #55                 9 -      #60                                                                              - - Met Ser Thr Leu Asn Thr Leu Lys Ala Gly Gl - #y Ile Leu Asn Arg        Phe                                                                                             965  - #               970  - #               975             - - Ser Lys Asp Ile Ala Ile Leu Asp Asp Leu Le - #u Pro Leu Thr Ile Phe                  980      - #           985      - #           990                  - - Asp Phe Ile Gln Leu Leu Leu Ile Val Ile Gl - #y Ala Ile Ala Val Val              995          - #      1000           - #     1005                      - - Ala Val Leu Gln Pro Tyr Ile Phe Val Ala Th - #r Val Pro Val Ile Val         1010              - #  1015               - # 1020                          - - Ala Phe Ile Met Leu Arg Ala Tyr Phe Leu Gl - #n Thr Ser Gln Gln Leu      1025               1030 - #               1035  - #              1040          - - Lys Gln Leu Glu Ser Glu Gly Arg Ser Pro Il - #e Phe Thr His Leu Val                     1045  - #              1050   - #             1055              - - Thr Ser Leu Lys Gly Leu Trp Thr Leu Arg Al - #a Phe Gly Arg Gln Pro                 1060      - #          1065       - #         1070                  - - Tyr Phe Glu Thr Leu Phe His Lys Ala Leu As - #n Leu His Thr Ala Asn             1075          - #      1080           - #     1085                      - - Trp Phe Leu Tyr Leu Ser Thr Leu Arg Trp Ph - #e Gln Met Arg Ile Glu         1090              - #  1095               - # 1100                          - - Met Ile Phe Val Ile Phe Phe Ile Ala Val Th - #r Phe Ile Ser Ile Leu      1105               1110 - #               1115  - #              1120          - - Thr Thr Gly Glu Gly Glu Gly Arg Val Gly Il - #e Ile Leu Thr Leu Ala                     1125  - #              1130   - #             1135              - - Met Asn Ile Met Ser Thr Leu Gln Trp Ala Va - #l Asn Ser Ser Ile Asp                 1140      - #          1145       - #         1150                  - - Val Asp Ser Leu Met Arg Ser Val Ser Arg Va - #l Phe Lys Phe Ile Asp             1155          - #      1160           - #     1165                      - - Met Pro Thr Glu Gly Lys Pro Thr Lys Ser Th - #r Lys Pro Tyr Lys Asn         1170              - #  1175               - # 1180                          - - Gly Gln Leu Ser Lys Val Met Ile Ile Glu As - #n Ser His Val Lys Lys      1185               1190 - #               1195  - #              1200          - - Asp Asp Ile Trp Pro Ser Gly Gly Gln Met Th - #r Val Lys Asp Leu Thr                     1205  - #              1210   - #             1215              - - Ala Lys Tyr Thr Glu Gly Gly Asn Ala Ile Le - #u Glu Asn Ile Ser Phe                 1220      - #          1225       - #         1230                  - - Ser Ile Ser Pro Gly Gln Arg Val Gly Leu Le - #u Gly Arg Thr Gly Ser             1235          - #      1240           - #     1245                      - - Gly Lys Ser Thr Leu Leu Ser Ala Phe Leu Ar - #g Leu Leu Asn Thr Glu         1250              - #  1255               - # 1260                          - - Gly Glu Ile Gln Ile Asp Gly Val Ser Trp As - #p Ser Ile Thr Leu Gln      1265               1270 - #               1275  - #              1280          - - Gln Trp Arg Lys Ala Phe Gly Val Ile Pro Gl - #n Lys Val Phe Ile Phe                     1285  - #              1290   - #             1295              - - Ser Gly Thr Phe Arg Lys Asn Leu Asp Pro Ty - #r Glu Gln Trp Ser Asp                 1300      - #          1305       - #         1310                  - - Gln Glu Ile Trp Lys Val Ala Asp Glu Val Gl - #y Leu Arg Ser Val Ile             1315          - #      1320           - #     1325                      - - Glu Gln Phe Pro Gly Lys Leu Asp Phe Val Le - #u Val Asp Gly Gly Cys         1330              - #  1335               - # 1340                          - - Val Leu Ser His Gly His Lys Gln Leu Met Cy - #s Leu Ala Arg Ser Val      1345               1350 - #               1355  - #              1360          - - Leu Ser Lys Ala Lys Ile Leu Leu Leu Asp Gl - #u Pro Ser Ala His Leu                     1365  - #              1370   - #             1375              - - Asp Pro Val Thr Tyr Gln Ile Ile Arg Arg Th - #r Leu Lys Gln Ala Phe                 1380      - #          1385       - #         1390                  - - Ala Asp Cys Thr Val Ile Leu Cys Glu His Ar - #g Ile Glu Ala Met Leu             1395          - #      1400           - #     1405                      - - Glu Cys Gln Gln Phe Leu Val Ile Glu Glu As - #n Lys Val Arg Gln Tyr         1410              - #  1415               - # 1420                          - - Asp Ser Ile Gln Lys Leu Leu Asn Glu Arg Se - #r Leu Phe Arg Gln Ala      1425               1430 - #               1435  - #              1440          - - Ile Ser Pro Ser Asp Arg Val Lys Leu Phe Pr - #o His Arg Asn Ser Ser                     1445  - #              1450   - #             1455              - - Lys Cys Lys Ser Lys Pro Gln Ile Ala Ala Le - #u Lys Glu Glu Thr Glu                 1460      - #          1465       - #         1470                  - - Glu Glu Val Gln Asp Thr Arg Leu                                                 1475          - #      1480                                             - -  - - <210> SEQ ID NO 5                                                   <211> LENGTH: 28                                                              <212> TYPE: DNA                                                               <213> ORGANISM: Homo sapiens                                                   - - <400> SEQUENCE: 5                                                         - - cggaattctc gagatctttt tttttttt         - #                  - #                 28                                                                      - -  - - <210> SEQ ID NO 6                                                   <211> LENGTH: 19                                                              <212> TYPE: DNA                                                               <213> ORGANISM: Homo sapiens                                                   - - <400> SEQUENCE: 6                                                         - - atgaagtcca aggatttag             - #                  - #                      - # 19                                                                   - -  - - <210> SEQ ID NO 7                                                   <211> LENGTH: 10                                                              <212> TYPE: RNA                                                               <213> ORGANISM: Homo sapiens                                                   - - <400> SEQUENCE: 7                                                         - - agaccaugca                - #                  - #                      - #        10                                                                   - -  - - <210> SEQ ID NO 8                                                   <211> LENGTH: 24                                                              <212> TYPE: DNA                                                               <213> ORGANISM: Homo sapiens                                                   - - <400> SEQUENCE: 8                                                         - - gttggcatgc tttgatgacg cttc          - #                  - #                    24                                                                      - -  - - <210> SEQ ID NO 9                                                   <211> LENGTH: 24                                                              <212> TYPE: DNA                                                               <213> ORGANISM: Homo sapiens                                                   - - <400> SEQUENCE: 9                                                         - - gttttcctgg attatgcctg gcac          - #                  - #                    24                                                                      - -  - - <210> SEQ ID NO 10                                                  <211> LENGTH: 20                                                              <212> TYPE: DNA                                                               <213> ORGANISM: Homo sapiens                                                   - - <400> SEQUENCE: 10                                                        - - gcagagtacc tgaaacagga            - #                  - #                      - # 20                                                                   - -  - - <210> SEQ ID NO 11                                                  <211> LENGTH: 20                                                              <212> TYPE: DNA                                                               <213> ORGANISM: Homo sapiens                                                   - - <400> SEQUENCE: 11                                                        - - cattcacagt agcttaccca            - #                  - #                      - # 20                                                                   - -  - - <210> SEQ ID NO 12                                                  <211> LENGTH: 20                                                              <212> TYPE: DNA                                                               <213> ORGANISM: Homo sapiens                                                   - - <400> SEQUENCE: 12                                                        - - taatggatca tgggccatgt            - #                  - #                      - # 20                                                                   - -  - - <210> SEQ ID NO 13                                                  <211> LENGTH: 20                                                              <212> TYPE: DNA                                                               <213> ORGANISM: Homo sapiens                                                   - - <400> SEQUENCE: 13                                                        - - acagtgttga atgtggtgca            - #                  - #                      - # 20                                                                   - -  - - <210> SEQ ID NO 14                                                  <211> LENGTH: 16                                                              <212> TYPE: DNA                                                               <213> ORGANISM: Homo sapiens                                                   - - <400> SEQUENCE: 14                                                        - - gttgttggcg gttgct             - #                  - #                      - #    16                                                                   - -  - - <210> SEQ ID NO 15                                                  <211> LENGTH: 16                                                              <212> TYPE: DNA                                                               <213> ORGANISM: Homo sapiens                                                   - - <400> SEQUENCE: 15                                                        - - gttgttggag gttgct             - #                  - #                      - #    16                                                                   - -  - - <210> SEQ ID NO 16                                                  <211> LENGTH: 20                                                              <212> TYPE: DNA                                                               <213> ORGANISM: Homo sapiens                                                   - - <400> SEQUENCE: 16                                                        - - ggcataatcc aggaaaactg            - #                  - #                      - # 20                                                                   - -  - - <210> SEQ ID NO 17                                                  <211> LENGTH: 20                                                              <212> TYPE: DNA                                                               <213> ORGANISM: Homo sapiens                                                   - - <400> SEQUENCE: 17                                                        - - ggcataatcc aggaaaacta            - #                  - #                      - # 20                                                                   - -  - - <210> SEQ ID NO 18                                                  <211> LENGTH: 16                                                              <212> TYPE: DNA                                                               <213> ORGANISM: Homo sapiens                                                   - - <400> SEQUENCE: 18                                                        - - accttctcca agaact             - #                  - #                      - #    16                                                                   - -  - - <210> SEQ ID NO 19                                                  <211> LENGTH: 16                                                              <212> TYPE: DNA                                                               <213> ORGANISM: Homo sapiens                                                   - - <400> SEQUENCE: 19                                                        - - accttctcaa agaact             - #                  - #                      - #    16                                                                   - -  - - <210> SEQ ID NO 20                                                  <211> LENGTH: 16                                                              <212> TYPE: DNA                                                               <213> ORGANISM: Homo sapiens                                                   - - <400> SEQUENCE: 20                                                        - - acactgagtg gaggtc             - #                  - #                      - #    16                                                                   - -  - - <210> SEQ ID NO 21                                                  <211> LENGTH: 16                                                              <212> TYPE: DNA                                                               <213> ORGANISM: Homo sapiens                                                   - - <400> SEQUENCE: 21                                                        - - acactgaggg gaggtc             - #                  - #                      - #    16                                                                   - -  - - <210> SEQ ID NO 22                                                  <211> LENGTH: 16                                                              <212> TYPE: DNA                                                               <213> ORGANISM: Homo sapiens                                                   - - <400> SEQUENCE: 22                                                        - - tgctcgttga cctcca             - #                  - #                      - #    16                                                                   - -  - - <210> SEQ ID NO 23                                                  <211> LENGTH: 16                                                              <212> TYPE: DNA                                                               <213> ORGANISM: Homo sapiens                                                   - - <400> SEQUENCE: 23                                                        - - tgctcgttga cctccc             - #                  - #                      - #    16                                                                   - -  - - <210> SEQ ID NO 24                                                  <211> LENGTH: 24                                                              <212> TYPE: DNA                                                               <213> ORGANISM: Homo sapiens                                                   - - <400> SEQUENCE: 24                                                        - - caactgtggt taaagcaata gtgt          - #                  - #                    24                                                                      - -  - - <210> SEQ ID NO 25                                                  <211> LENGTH: 24                                                              <212> TYPE: DNA                                                               <213> ORGANISM: Homo sapiens                                                   - - <400> SEQUENCE: 25                                                        - - gcacagattc tgagtaacca taat          - #                  - #                    24                                                                      - -  - - <210> SEQ ID NO 26                                                  <211> LENGTH: 15                                                              <212> TYPE: DNA                                                               <213> ORGANISM: Homo sapiens                                                   - - <400> SEQUENCE: 26                                                        - - gactctcctt ttgga              - #                  - #                      - #    15                                                                   - -  - - <210> SEQ ID NO 27                                                  <211> LENGTH: 15                                                              <212> TYPE: DNA                                                               <213> ORGANISM: Homo sapiens                                                   - - <400> SEQUENCE: 27                                                        - - gactctcatt ttgga              - #                  - #                      - #    15                                                                   - -  - - <210> SEQ ID NO 28                                                  <211> LENGTH: 20                                                              <212> TYPE: DNA                                                               <213> ORGANISM: Homo sapiens                                                   - - <400> SEQUENCE: 28                                                        - - gtatggtttg gttgacttgg            - #                  - #                      - # 20                                                                   - -  - - <210> SEQ ID NO 29                                                  <211> LENGTH: 20                                                              <212> TYPE: DNA                                                               <213> ORGANISM: Homo sapiens                                                   - - <400> SEQUENCE: 29                                                        - - gtatggtttg gttgacttgt            - #                  - #                      - # 20                                                                   - -  - - <210> SEQ ID NO 30                                                  <211> LENGTH: 20                                                              <212> TYPE: DNA                                                               <213> ORGANISM: Homo sapiens                                                   - - <400> SEQUENCE: 30                                                        - - tttggtaata ggacatctcc            - #                  - #                      - # 20                                                                   - -  - - <210> SEQ ID NO 31                                                  <211> LENGTH: 20                                                              <212> TYPE: DNA                                                               <213> ORGANISM: Homo sapiens                                                   - - <400> SEQUENCE: 31                                                        - - tttggtaata agacatctcc            - #                  - #                      - # 20                                                                   - -  - - <210> SEQ ID NO 32                                                  <211> LENGTH: 13                                                              <212> TYPE: PRT                                                               <213> ORGANISM: Homo sapiens                                                   - - <400> SEQUENCE: 32                                                        - - Gly Thr Ile Lys Glu Asn Ile Ile Phe Gly Va - #l Ser Tyr                    1               5 - #                 10                                     - -  - - <210> SEQ ID NO 33                                                  <211> LENGTH: 12                                                              <212> TYPE: PRT                                                               <213> ORGANISM: Homo sapiens                                                   - - <400> SEQUENCE: 33                                                        - - Gly Thr Ile Lys Glu Asn Ile Phe Gly Val Se - #r Tyr                        1               5 - #                 10                                   __________________________________________________________________________

We claim:
 1. A DNA molecule comprising an intronless DNA sequenceencoding a mutant CFTR polypeptide, said intronless DNA sequence varyingfrom that of SEQ ID NO:1 in having nucleotide sequence variantsresulting in a deletion or alteration of an amino acid in the encodedCFTR polypeptide, so that the sequence of said encoded CFTR polypeptidevaries from that of SEQ ID NO:2 in an amino acid residue positionselected from the group consisting of amino acid residues 85, 178, 455,493, 507, 542, 549, 560, and 1092 of SEQ ID NO:2, and wherein analteration at position 549 is either S549R or S549I.
 2. A DNA moleculeencoding a mutant CFTR polypeptide, said DNA molecule comprising asequence which varies from that of SEQ ID NO:1 in having a deletion oralteration of a nucleotide selected from those nucleotides at:(a)positions 556 and 3659 of SEQ ID NO:1; (b) the first nucleotide inintron 4 following nucleotide position 621(+1) located in the 3' end ofexon 4 of SEQ ID NO:1; (c) the first nucleotide in intron 5 followingnucleotide position 711(+1) located in the 3' end of exon 5 of SEQ IDNO:1; and (d) the last nucleotide of intron 10 before nucleotideposition 1717(-1) located in the 5' end of exon 11 of SEQ ID NO:1.
 3. ADNA molecule comprising a DNA sequence selected from the groupconsisting of:(a) a portion of a DNA sequence of claim 1, said portionincluding at least 16 sequential nucleotides and including a nucleotidesequence variant of claim 1 resulting in a deletion or alteration of anamino acid in the encoded CFTR polypeptide, so that the sequence of saidencoded CFTR polypeptide varies from that of SEQ ID NO:2 in an aminoacid residue position selected from the group consisting of amino acidresidues 85, 178, 455, 493, 507, 542, 549, 560 and 1092 of SEQ ID NO:2,and wherein an alteration at position 549 is either S549R or S549I; (b)a portion of a DNA sequence according to claim 2, said portion includingat least 16 sequential nucleotides and including a nucleotide sequencevariant which varies from that of SEQ ID NO:1 in having a deletion oralteration of a nucleotide selected from those nucleotides at:(i)positions 556 and 3659 of SEQ ID NO:1; (ii) the first nucleotide inintron 4 following nucleotide position 621(+1) located in the 3' end ofexon 4 of SEQ ID NO:1; (iii) the first nucleotide in intron 5 followingnucleotide position 711(+1) located in the 3' end of exon 5 of SEQ IDNO:1; and (iv) the last nucleotide of intron 10 before nucleotideposition 1717(-1) located in the 5' end of exon 11 of SEQ ID NO:1; (c)DNA sequences encoding an epitope comprising a mutant CFTR polypeptideencoded by at least 18 sequential nucleotides in the selected sequenceof those sequences recited above in part (a) or part (b).
 4. A probecomprising a nucleotide sequence selected from the group of DNAsequences consisting of:(a) a portion of a DNA sequence of claim 1, saidportion including at least 16 sequential nucleotides and including anucleotide sequence variant of claim 1; and (b) a portion of a DNAsequence of claim 2, said portion including at least 16 sequentialnucleotides and including a nucleocide sequence variant of claim
 2. 5. Arecombinant cloning vector comprising the DNA molecule of claim
 3. 6. Ahost cell transformed with the vector according to claim
 5. 7. A methodfor producing mutant CFTR polypeptide comprising the steps of:(a)culturing a host cell of claim 6 in a medium and under conditionsfavorable for expression of the mutant CFTR polypeptide; (b) isolatingthe expressed mutant CFTR polypeptide.
 8. A DNA molecule according toclaim 1, wherein said alterations in amino acid residues of SEQ ID NO:2are: G85E, G178R, A455E, Q493X, I507deletion, G542X, R560T, and Y1092X.9. A DNA molecule according to claim 2, wherein said deletion oralteration of said nucleotides of SEQ ID NO:1 are: A556deleted,C3659deleted, 621(+1G>T), 711(+1G>T) and 1717(-1G>A).